AI Spend Live - Find the AI session draining your budget

See the waste before it eats the day.

This example keeps the scale of a real local run while replacing private names and paths. The point is not the bill. The point is knowing which behavior to change next.

Example run: 7 days API-equivalent estimate from local token logs

Private data removed

Selected range 7,526.3M $12,816.46 estimate

24h burn 153.6M $236.07 estimate

Last hour 15.3M current pace

Cache reuse 81% 7,328.1M cached/read tokens

Sessions costing the most what to close, split, or summarize first

Session	Provider	Duration	Cache	Tokens	Do next
Long Context Refactorlargest repeated-context thread	Claude	6.2d	99%	3,310.4M	Summarize and restart
UI Polish Sessionlarge ongoing design pass	Claude	1.1d	99%	1,244.8M	Split by screen
Codex Repository Sweeplong-running codebase work	Codex	3.0d	49%	928.0M	Narrow file scope
Feature Handoff Sessionsingle thread carrying context	Claude	5.4d	96%	709.1M	Checkpoint decisions

What the dashboard tells you to do optimization moves, not trivia

One Claude thread carried 44% of usage. Close it before continuing. Ask for a concise state summary, then start the next task in a new session.

Cache reads dominated the range. Do not keep adding unrelated work to the same chat. Move stable facts into a handoff note.

A single turn spiked to 1.8M tokens. Break repo-wide asks into smaller file or subsystem passes before asking the model to reason.

Codex was cheaper, but still measurable. Route broad mechanical exploration to Codex, then bring the narrowed result back to Claude.

Context pressure changed the answer. When a session shows a half-full context window or automatic compaction, summarize to markdown before quality drops.

One session is eating the 5h window. Checkpoint or close the top burner before the short-window budget disappears.

Answer the spend questions people actually ask.

When someone shares a screenshot of a context meter, usage table, budget policy, or agent comparison, the useful answer is a recommendation tied to the metric, the reset window, and the session causing the burn.

Context pressure

52% full

When the context window is filling up.

Tell the user to pause, ask for a compact state summary, save it to a handoff file, and restart in a clean thread before output quality degrades.

Agent comparison

72k vs 91k

When one tool feels more expensive.

Compare task outcome, input plus cache-write tokens, tool access, and retry count. Recommend the agent with the right connectors, not just the lowest-looking session.

Budget guardrails

5h + weekly

When budget windows get eaten early.

Compare remaining 5-hour and weekly budget against the sessions consuming each window. Pause fast loops, checkpoint the top burner, and continue only with scoped work.

Tool routing

MCP gap

When the cheaper path lacks tools.

Route broad repo sweeps to Codex, keep MCP-heavy or product-judgment work in the environment with the right tools, then hand off a narrowed summary.

Use stronger prompts before stronger settings.

A good token plan is not just a weekly limit. It is knowing what remains in the 5-hour and weekly windows, when to spend reasoning, and when to stop a fast loop.

Model routing

5.5 -> 5.3

Plan with GPT-5.5, execute with Codex.

Use GPT-5.5 for ambiguous planning, tradeoffs, acceptance criteria, and review. Move implementation to GPT-5.3 Codex once files, constraints, and tests are clear.

Fast mode

bounded only

Keep /fast for finish-line work.

Fast mode is useful for narrow edits and quick checks. It is expensive when the agent is still discovering scope because it can burn through more turns before you notice.

Prompt shape

outcome first

State the result, not every step.

Give the goal, success criteria, allowed side effects, evidence rules, and output shape. Add step-by-step process only when the path itself matters.

Budget pacing

5h + week

Track remaining budget, not just spend.

Compare rolling 5-hour and weekly remaining tokens against the sessions consuming them. If a session is eating the window, checkpoint it before starting another broad turn.

Do this before buying more tokens.

The dashboard is useful because it turns usage, screenshots, and team questions into a concrete next move.

1

Split the thread.

If one session dominates the range, stop using it as the everything-chat. Divide the work by feature, file group, or decision.

2

Summarize and restart.

When cache reads are huge, ask for a compact handoff and continue in a fresh session with only the useful state.

3

Narrow the context.

Replace repo-wide prompts with targeted file lists, failing tests, screenshots, or exact error output.

4

Route the work.

Use Codex for broad codebase sweeps and mechanical edits. Save Claude for architecture, UX, and review.

5

Save the evidence.

Keep the screenshot, terminal output, or usage table with the metric so the recommendation answers the actual situation.

6

Set spend rules.

Use monthly limits, exception notes, and approval thresholds so heavy users can justify real wins before costs drift.

7

Gate fast mode.

Use /fast after the write set is known. If the 5-hour remaining number drops quickly, switch back to scoped prompts and measured effort.

8

Spend reasoning intentionally.

Start with low or medium effort for bounded work. Raise effort only when the task is ambiguous and the result is worth the extra burn.

Built for agent-heavy coding days.

Claude and Codex already keep local usage traces. AI Spend Live turns them, plus supporting screenshots from other tools, into operational signal.

Session identity

Readable names before hashes.

Claude titles and Codex thread names show up first, so you know which real task is burning tokens.

Limit pressure

See stale high-burn threads.

Duration, cache reuse, and token totals reveal sessions that should be closed or summarized.

Spike diagnosis

Find the turn that exploded.

Largest-turn views catch giant asks before you repeat them all afternoon.

Question handling

Turn screenshots into recommendations.

Context meters, usage tables, and spend-policy threads become prompts for what to split, compact, reroute, or cap.

Workflow routing

Account for missing tools.

Token totals are weighed against MCP access, repo context, and retry loops so the cheapest-looking path does not hide extra work.

Budget policy

Support limits with evidence.

Per-user guardrails and exception reviews can point at the sessions, tasks, and outcomes behind the spend.

Prompt guidance

Recommend the prompt shape.

The dashboard can turn a spike into outcome-first asks, compact handoffs, scoped file lists, and effort changes.

Budget runway

Warn before either reset is at risk.

Selectable Claude and Codex plan presets turn official 5-hour message windows into local token estimates, then show whether one session is burning through the budget too early.

Model discipline

Separate planning from execution.

Use frontier planning for uncertainty, then hand Codex a bounded write set, command, and acceptance criteria.

Share it. Clone it. Run it locally.

AI Spend Live is meant to be a public tool people can use on their own machines. No hosted account, no uploaded prompts.

Clone the repo

Download the public repo wherever Claude Code and Codex are already installed.

Start the dashboard

Run the local Node server. It reads only your machine's usage logs.

Change your workflow

Use the top sessions and turns to decide what to split, summarize, narrow, or reroute.

PowerShelllocal

git clone https://github.com/AnthonyDiPerna/aispend-live.git
cd aispend-live
npm install
$env:AI_SPEND_5H_TOKEN_BUDGET="8000000"
$env:AI_SPEND_WEEKLY_TOKEN_BUDGET="50000000"
npm run dashboard

# open http://127.0.0.1:9020/

Open the GitHub repo to star, fork, or file an issue.

Your prompts should not leave your machine just to explain where the tokens went.

The website promotes the tool. The dashboard runs locally.

100% local log parsing

0 prompt text in the UI payload

7 recommendation signals tracked

OFL self-hosted Space Grotesk font