Opencode Tokens: How They Work and How to Stop Burning Them

TLDR: Every opencode turn resends the full context window — history, file reads, tool output, all of it. /compact is the most important command you're probably not using. Specific prompts with file paths cost 4× fewer tokens than vague ones. A focused Sonnet session runs ₹50–70; an unfocused one can quietly hit ₹600+.

Opencode is an open-source, terminal-native AI coding assistant from the SST team. It supports every major model provider — Anthropic, OpenAI, Google, Groq, and more — and gives you a keyboard-driven TUI that feels closer to your editor than a chat window.

But like every AI coding tool, it runs on tokens. And if you're not deliberate, a single coding session can quietly consume millions of them before you notice.

I've been using opencode heavily for about three months. Here's exactly how it burns tokens and the concrete steps to cut that rate without losing productivity.

How opencode actually consumes tokens

Every message you send goes through a context window — a rolling buffer of everything the model can see:

[System prompt]         → opencode's instructions (~500–1,500 tokens)
[Conversation history]  → every message in the current session
[File contents]         → files opencode read to answer your questions
[Tool call results]     → output of ls, grep, bash commands the model ran
[Your message]          → what you actually typed

The critical part: all of this is sent on every single turn. The model has no persistent memory — it reads the full context from scratch each time. As your session grows, so does the cost of every subsequent message.

Turn 1:  2,000 tokens sent   (cheap)
Turn 5:  8,000 tokens sent   (4× the cost of turn 1)
Turn 20: 35,000 tokens sent  (17× the cost of turn 1)

The biggest token drains in practice

File reads accumulate and stay. When you ask opencode to look at a file, the entire file content gets injected into context — and it stays there for the rest of the session. A 500-line service file is roughly 3,000 tokens. At Sonnet pricing, that's about ₹0.77 per turn. Multiply by 15 turns and you're paying ₹11.50 for context that's been sitting idle since turn 2.

Tool call output piles up. Every time opencode runs a bash command, grep, or file read, the output gets appended. A find . -type f in a large repo can return thousands of lines. A npm install output is surprisingly long.

Node modules and generated files. If opencode explores naively and lands in node_modules, dist, or build directories, you can burn tens of thousands of tokens on generated code you never wanted it to read. I've watched this happen.

Conversation history snowballs. By default, opencode keeps the full exchange in context. A 30-turn session can easily accumulate 80,000–120,000 tokens of history.

`/compact` — use this constantly

The single most impactful command: /compact.

> /compact

This replaces the full conversation history with a short summary, drastically shrinking the context. The model retains the conclusions of past turns without the verbatim exchange.

I think of /compact as a git commit — you checkpoint your work and start the next piece fresh with a clean context budget.

When to use it:

After completing a discrete task (bug fixed, feature added)
When the conversation has drifted through multiple topics
When response quality starts degrading (a sign the model is struggling with a bloated context)
Before switching to a new module or file

`.opencodeignore` — block expensive paths permanently

Opencode respects an .opencodeignore file in your project root (same syntax as .gitignore). Files and directories listed here never enter the context.

# .opencodeignore

# Build artifacts
dist/
build/
.next/
out/

# Dependencies
node_modules/
vendor/

# Generated / compiled
*.min.js
*.min.css
*.map
*.d.ts.map
coverage/

# Large data files
*.csv
fixtures/large/

# Secrets and env
.env
.env.*
*.pem
*.key

# Media
public/images/
assets/videos/

A well-tuned .opencodeignore is the difference between opencode reading 50 relevant files and accidentally indexing 15,000 files in node_modules. Set this up before your first session on a new project.

Be specific in your prompts

Vague prompts cause opencode to read widely. Specific prompts tell it exactly where to look.

# ❌ Reads multiple files, explores broadly, uses 8,000+ tokens
> fix the auth bug

# ✅ Single file, targeted fix, uses ~2,000 tokens
> in src/guards/auth.guard.ts around line 45, the canActivate check
  returns true when the token is expired. Fix the expiry comparison.

The specificity formula:

[file path] + [line range or function name] + [what is wrong] + [what correct looks like]

The more you pre-diagnose, the less the model has to explore — and every exploration step costs tokens. Spending 2 minutes pinpointing the issue before prompting saves 10× that in context costs.

Keep sessions task-scoped

Each opencode session should be one coherent task. Don't let sessions bleed across features.

# Session 1: Add pagination to the user list endpoint
> add cursor-based pagination to GET /api/users ...
[done] → close session

# Session 2: Fix the date formatting bug
> the formatDate() function in utils/date.ts returns undefined for ...
[done] → close session

Why this matters: a fresh session starts with only the system prompt in context — roughly 1,000 tokens. Continuing yesterday's session might start with 40,000 tokens of stale history before you type your first message today.

Right model for the task

The cost difference between models is enormous — and many tasks don't need the most expensive one.

Task	Model	Approx. cost (INR/MTok input)
Autocomplete, simple edits	Haiku 4.5 / Gemini Flash	₹8–68
Feature development, debugging	Sonnet 4.6 / GPT-4o	₹212–255
Architecture review, complex refactors	Opus 4.7 / o3	₹850–1,275

Switch models in the config or mid-session:

> /model claude-haiku-4-5-20251001

My default: Haiku for anything that doesn't require reasoning about the full codebase. Sonnet when I need real understanding. Opus only for the hard architectural ones.

Control what enters context explicitly

Instead of letting opencode discover files on its own, tell it exactly what to read:

# ❌ opencode explores several files to understand structure
> how does authentication work in this app?

# ✅ You point directly to what matters
> read src/middleware/auth.ts and src/services/jwt.service.ts.
  Explain how the token validation pipeline works.

When you control what gets read, you control what enters the context window.

`/clear` for a hard reset

When a session has gone sideways — wrong direction, too much accumulated noise — don't keep patching it:

> /clear

This wipes the entire conversation history and starts fresh from the system prompt. Sometimes the cheapest thing is a clean slate. I use this more than I expected to.

What a typical session actually costs

Estimated tokens = system prompt + (files read × avg file size) + (turns × avg message size)

# Example: feature work session on Sonnet
System prompt:        1,200 tokens
4 files read:         8,000 tokens
15 turns × 800 avg:  12,000 tokens
Tool call results:    5,000 tokens
─────────────────────────────────────
Total input (last turn): ~26,200 tokens
Cost for last turn:       26,200 × ₹255/MTok = ₹6.68
Full session average:     ₹50–70

A focused 15-turn Sonnet session runs about ₹50–70. An unfocused 40-turn session with large files drifting in and out can hit ₹400–600+. The difference is almost entirely discipline around context hygiene.

Token hygiene checklist

Before starting:

.opencodeignore covers node_modules, dist, build artifacts
You know which 2–4 files are relevant to the task

During the session:

Prompts include file path + line range when possible
Run /compact after completing each discrete task
Switch to Haiku for straightforward edits

When a session drifts:

Run /clear and restart with a focused prompt
Break multi-topic work into separate sessions

Building Your First MCP Server: Tools, Resources, and the Right Mental Model

Building an MCP server is simpler than it looks — a few tool definitions, a request handler, and a stdio transport. The hard part is designing tools the model will actually use correctly. This guide builds a real server from scratch and covers every design decision that separates a good server from a frustrating one.

13 min read ·May 19, 2026

Read

AIEngineering

Intermediate

You're Using Claude Wrong as a Developer

10 power moves and 5 bonus hacks that changed how I ship code. From treating Claude like a fancier Stack Overflow to unlocking its full potential — concrete prompts, real examples, no fluff.

10 min read ·May 17, 2026

Read

AIFrontendEngineering

Beginner

The Frontend Engineer's Honest Guide to Gen AI

From skeptic to daily user — an honest take on how Gen AI actually shows up in frontend work without the hype. What LLMs really are, what they're good at, and what frontend devs still need to own.

11 min read ·May 17, 2026

Read

Back to all posts

How opencode actually consumes tokens

The biggest token drains in practice

/compact — use this constantly

.opencodeignore — block expensive paths permanently

Be specific in your prompts

Keep sessions task-scoped

Right model for the task

Control what enters context explicitly

/clear for a hard reset

What a typical session actually costs

Token hygiene checklist

Building Your First MCP Server: Tools, Resources, and the Right Mental Model

You're Using Claude Wrong as a Developer

The Frontend Engineer's Honest Guide to Gen AI

Stay in the loop

`/compact` — use this constantly

`.opencodeignore` — block expensive paths permanently

`/clear` for a hard reset