TLDR: Every opencode turn resends the full context window — history, file reads, tool output, all of it. /compact is the most important command you're probably not using. Specific prompts with file paths cost 4× fewer tokens than vague ones. A focused Sonnet session runs ₹50–70; an unfocused one can quietly hit ₹600+.
Opencode is an open-source, terminal-native AI coding assistant from the SST team. It supports every major model provider — Anthropic, OpenAI, Google, Groq, and more — and gives you a keyboard-driven TUI that feels closer to your editor than a chat window.
But like every AI coding tool, it runs on tokens. And if you're not deliberate, a single coding session can quietly consume millions of them before you notice.
I've been using opencode heavily for about three months. Here's exactly how it burns tokens and the concrete steps to cut that rate without losing productivity.
How opencode actually consumes tokens
Every message you send goes through a context window — a rolling buffer of everything the model can see:
[System prompt] → opencode's instructions (~500–1,500 tokens)
[Conversation history] → every message in the current session
[File contents] → files opencode read to answer your questions
[Tool call results] → output of ls, grep, bash commands the model ran
[Your message] → what you actually typed
The critical part: all of this is sent on every single turn. The model has no persistent memory — it reads the full context from scratch each time. As your session grows, so does the cost of every subsequent message.
Turn 1: 2,000 tokens sent (cheap)
Turn 5: 8,000 tokens sent (4× the cost of turn 1)
Turn 20: 35,000 tokens sent (17× the cost of turn 1)
The biggest token drains in practice
File reads accumulate and stay. When you ask opencode to look at a file, the entire file content gets injected into context — and it stays there for the rest of the session. A 500-line service file is roughly 3,000 tokens. At Sonnet pricing, that's about ₹0.77 per turn. Multiply by 15 turns and you're paying ₹11.50 for context that's been sitting idle since turn 2.
Tool call output piles up. Every time opencode runs a bash command, grep, or file read, the output gets appended. A find . -type f in a large repo can return thousands of lines. A npm install output is surprisingly long.
Node modules and generated files. If opencode explores naively and lands in node_modules, dist, or build directories, you can burn tens of thousands of tokens on generated code you never wanted it to read. I've watched this happen.
Conversation history snowballs. By default, opencode keeps the full exchange in context. A 30-turn session can easily accumulate 80,000–120,000 tokens of history.
/compact — use this constantly
The single most impactful command: /compact.
> /compact
This replaces the full conversation history with a short summary, drastically shrinking the context. The model retains the conclusions of past turns without the verbatim exchange.
I think of /compact as a git commit — you checkpoint your work and start the next piece fresh with a clean context budget.
When to use it:
- After completing a discrete task (bug fixed, feature added)
- When the conversation has drifted through multiple topics
- When response quality starts degrading (a sign the model is struggling with a bloated context)
- Before switching to a new module or file
.opencodeignore — block expensive paths permanently
Opencode respects an .opencodeignore file in your project root (same syntax as .gitignore). Files and directories listed here never enter the context.
# .opencodeignore
# Build artifacts
dist/
build/
.next/
out/
# Dependencies
node_modules/
vendor/
# Generated / compiled
*.min.js
*.min.css
*.map
*.d.ts.map
coverage/
# Large data files
*.csv
fixtures/large/
# Secrets and env
.env
.env.*
*.pem
*.key
# Media
public/images/
assets/videos/
A well-tuned .opencodeignore is the difference between opencode reading 50 relevant files and accidentally indexing 15,000 files in node_modules. Set this up before your first session on a new project.
Be specific in your prompts
Vague prompts cause opencode to read widely. Specific prompts tell it exactly where to look.
# ❌ Reads multiple files, explores broadly, uses 8,000+ tokens
> fix the auth bug
# ✅ Single file, targeted fix, uses ~2,000 tokens
> in src/guards/auth.guard.ts around line 45, the canActivate check
returns true when the token is expired. Fix the expiry comparison.
The specificity formula:
[file path] + [line range or function name] + [what is wrong] + [what correct looks like]
The more you pre-diagnose, the less the model has to explore — and every exploration step costs tokens. Spending 2 minutes pinpointing the issue before prompting saves 10× that in context costs.
Keep sessions task-scoped
Each opencode session should be one coherent task. Don't let sessions bleed across features.
# Session 1: Add pagination to the user list endpoint
> add cursor-based pagination to GET /api/users ...
[done] → close session
# Session 2: Fix the date formatting bug
> the formatDate() function in utils/date.ts returns undefined for ...
[done] → close session
Why this matters: a fresh session starts with only the system prompt in context — roughly 1,000 tokens. Continuing yesterday's session might start with 40,000 tokens of stale history before you type your first message today.
Right model for the task
The cost difference between models is enormous — and many tasks don't need the most expensive one.
| Task | Model | Approx. cost (INR/MTok input) |
|---|---|---|
| Autocomplete, simple edits | Haiku 4.5 / Gemini Flash | ₹8–68 |
| Feature development, debugging | Sonnet 4.6 / GPT-4o | ₹212–255 |
| Architecture review, complex refactors | Opus 4.7 / o3 | ₹850–1,275 |
Switch models in the config or mid-session:
> /model claude-haiku-4-5-20251001
My default: Haiku for anything that doesn't require reasoning about the full codebase. Sonnet when I need real understanding. Opus only for the hard architectural ones.
Control what enters context explicitly
Instead of letting opencode discover files on its own, tell it exactly what to read:
# ❌ opencode explores several files to understand structure
> how does authentication work in this app?
# ✅ You point directly to what matters
> read src/middleware/auth.ts and src/services/jwt.service.ts.
Explain how the token validation pipeline works.
When you control what gets read, you control what enters the context window.
/clear for a hard reset
When a session has gone sideways — wrong direction, too much accumulated noise — don't keep patching it:
> /clear
This wipes the entire conversation history and starts fresh from the system prompt. Sometimes the cheapest thing is a clean slate. I use this more than I expected to.
What a typical session actually costs
Estimated tokens = system prompt + (files read × avg file size) + (turns × avg message size)
# Example: feature work session on Sonnet
System prompt: 1,200 tokens
4 files read: 8,000 tokens
15 turns × 800 avg: 12,000 tokens
Tool call results: 5,000 tokens
─────────────────────────────────────
Total input (last turn): ~26,200 tokens
Cost for last turn: 26,200 × ₹255/MTok = ₹6.68
Full session average: ₹50–70
A focused 15-turn Sonnet session runs about ₹50–70. An unfocused 40-turn session with large files drifting in and out can hit ₹400–600+. The difference is almost entirely discipline around context hygiene.
Token hygiene checklist
Before starting:
-
.opencodeignorecoversnode_modules,dist, build artifacts - You know which 2–4 files are relevant to the task
During the session:
- Prompts include file path + line range when possible
- Run
/compactafter completing each discrete task - Switch to Haiku for straightforward edits
When a session drifts:
- Run
/clearand restart with a focused prompt - Break multi-topic work into separate sessions