How to Cut 90% of Your OpenClaw Token Usage

·7 min read·By Keying Wu

You think your prompts are the problem. "I need to write shorter prompts." "Maybe I should remove that example from my instructions."

Your prompt is 2-5% of your total token spend.

The other 95% is stuff you never typed. System instructions, config files, tool schemas, chat history, old command outputs. All packed into every API call, whether you asked for it or not.

Run this command right now:

/context detail

You'll see something like this:

System prompt (run): 58,254 chars (~14,564 tok)
Skills list: 28,902 chars (~7,226 tok) (71 skills)
AGENTS.md: 8,088 chars (~2,022 tok)
MEMORY.md: 3,604 chars (~901 tok)
SOUL.md: 1,479 chars (~370 tok)

That's ~25,000 tokens before you've said a single word. Every turn. Every API call. That's your context tax.

Table of contents

The Context Tax

Every time OpenClaw calls the model, it assembles a context package. Two categories of stuff go in.

Fixed cost (you pay this every turn):

  • System prompt: base instructions, rebuilt every run (~15,000 tokens)
  • Your config files: AGENTS.md, TOOLS.md, MEMORY.md, all auto-injected
  • Tool schemas: every registered tool's JSON schema (hidden, but they count — one user found tool schemas alone eating ~8,000 tokens)
  • Skill metadata: a compact list of installed skills, ~97 chars per skill

Variable cost (grows unless you manage it):

  • Chat history: every message, back and forth
  • Tool outputs: results from commands, file reads, API calls — these get massive
  • Attachments: images, audio, file content

The fixed cost is your rent. The variable cost is your inventory. Most people let both grow unchecked.

Three things move the needle.

Lever 1: Start Fresh Sessions

This is the highest-impact change. Most people resist it. It saves the most money.

A long-running session accumulates everything: old tool outputs, stale debug logs, file reads from three tasks ago. All of it gets sent on every subsequent API call.

New task, new session. That one habit will cut more cost than every config tweak in this article combined.

For sessions you do keep running, two tools help:

Pruning trims old tool outputs from the context sent to the model. Your full session transcript stays on disk. You lose nothing.

{
  "agents": {
    "defaults": {
      "contextPruning": {
        "mode": "cache-ttl",
        "ttl": "1h"
      }
    }
  }
}

You can scope pruning to just the noisiest tools (like exec and read) while leaving image results alone:

{
  "contextPruning": {
    "mode": "cache-ttl",
    "tools": {
      "allow": ["exec", "read"],
      "deny": ["*image*"]
    }
  }
}

Compaction summarizes the entire conversation into a short entry and frees up massive context space. Use /compact manually, or enable auto-compaction in config. Unlike pruning, compaction is permanent — the summary gets saved to session history.

{
  "compaction": {
    "mode": "safeguard",
    "reserveTokensFloor": 40000,
    "memoryFlush": {
      "enabled": true,
      "softThresholdTokens": 4000
    }
  }
}

The safeguard mode does proactive chunked summarization instead of one-shot compaction. The memoryFlush gives the agent a turn to save important context before the session is summarized.

One more thing: if you're using prompt caching, align your pruning TTL with your cache TTL. If they're out of sync, you get expensive cache-miss spikes where the whole prompt re-caches from scratch. That mismatch alone can cost more than everything else.

Lever 2: Stop Running Things So Often

Every heartbeat is a full API call. Full context. Full price. Not a cheap ping. A full turn.

A 5-minute heartbeat with 50K tokens of context = 600,000 tokens per hour. On Opus, that's roughly $3/hour just sitting there. One user burned $50 in a single day from a misconfigured email check running every 5 minutes.

Fix 1: Route heartbeats to a cheap model. Your heartbeat doesn't need Opus. It needs to check if anything changed. Haiku or a local model via Ollama costs 80-99% less.

{
  "heartbeat": {
    "every": "30m",
    "model": "anthropic/claude-haiku-4.5",
    "activeHours": {
      "start": "09:00",
      "end": "22:00",
      "timezone": "America/New_York"
    }
  }
}

No reason to burn tokens at 3am.

Fix 2: Batch your checks. Instead of 5 separate cron jobs checking inbox, calendar, Slack, GitHub, and email — put all 5 checks into one heartbeat. One turn instead of five. Same result, 80% less cost.

Fix 3: Know when to use cron vs heartbeat. They're different tools.

Use heartbeat when...Use cron when...
Batching multiple checks into one turnYou need exact timing ("every day at 9am")
The agent needs conversation contextThe job should run in isolation
Periodic awareness matters more than precisionThe task is standalone

Fix 4: Use HEARTBEAT_OK suppression. When a heartbeat finds nothing to do, reply with just HEARTBEAT_OK. This suppresses the response and stops downstream history from bloating.

Lever 3: Shrink What Gets Auto-Injected

Your config files get sent every single turn. If your AGENTS.md is 50,000 characters, you pay for 50,000 characters on every API call. Not once. Every call.

Here's a real example from /context list:

FileRaw SizeInjected
AGENTS.md8,088 chars8,088 chars (~2,022 tok)
MEMORY.md3,604 chars3,604 chars (~901 tok)
SOUL.md1,479 chars1,479 chars (~370 tok)
USER.md1,470 chars1,470 chars (~368 tok)
TOOLS.md1,082 chars1,082 chars (~271 tok)

That AGENTS.md alone is 2x the recommended max of ~4,000 chars. Every turn, you're paying for those 2,022 tokens whether you need them or not.

Fix 1: Keep bootstrap files short. Move detailed reference material into memory/*.md files — those are loaded on demand, not auto-injected.

FileRecommended Max
AGENTS.md~4,000 chars
SOUL.md~2,000 chars
TOOLS.md~3,000 chars
MEMORY.md~8,000 chars

Fix 2: Cap bootstrap injection.

{
  "agents": {
    "defaults": {
      "bootstrapMaxChars": 10000,
      "bootstrapTotalMaxChars": 50000
    }
  }
}

The defaults are 20,000 per file and 150,000 total. That's generous. Tune them down.

Fix 3: Lower image dimensions. Vision tokens are expensive. The default max dimension is 1,200px. You often don't need that much.

{
  "agents": {
    "defaults": {
      "imageMaxDimensionPx": 800
    }
  }
}

Fix 4: Trim your skill descriptions. Skill metadata is injected every turn with ~97 chars per skill plus field lengths. 71 skills with verbose descriptions? That's 28,902 chars (~7,226 tokens) on every turn. Keep descriptions concise. Remove skills you don't use.

Fix 5: Check tool schema weight. Run /context detail. Hidden tool JSON schemas can dominate your context — one user found them at ~8,000 tokens. If you've registered tools you rarely use, consider removing them.

Anti-Patterns

These are the mistakes that cause token explosions:

  1. One mega-session for everything. Debugging, writing, deploying, emailing — all in one session. Every past conversation inflates every future API call.

  2. AGENTS.md over 8,000 chars. You're paying for it every turn. Twice the recommended max means double the cost.

  3. 71 skills with verbose descriptions. Each skill adds ~100-1000 chars to every API call. That's 28,902 chars (~7,226 tokens) sent every turn.

  4. 5-minute heartbeats on Opus. 600,000 tokens per hour doing nothing. $3/hour just sitting there.

  5. Five separate cron jobs instead of one batched heartbeat. 5x the overhead for the same result.

  6. Never pruning tool outputs. Those exec and read dumps pile up silently.

  7. Long conversations without compaction. Every turn gets more expensive than the last.

  8. Mismatched cache and pruning settings. When cache expires but context hasn't been pruned, the next turn re-caches everything at full price.

  9. Full-resolution screenshots when thumbnails work fine. Vision tokens are the most expensive kind.

  10. Treating cron and heartbeat as interchangeable. Wrong tool for the job costs you.

Users have gone from $600/month to $20/month, from $720/month to $72/month, from $340/month to $112/month. The pattern: measure with /context detail, fix these anti-patterns, measure again.

Further reading: