When the context window is filling up.
Tell the user to pause, ask for a compact state summary, save it to a handoff file, and restart in a clean thread before output quality degrades.
AI Spend Live shows which agent sessions to split, summarize, restart, slow down, or route differently, then helps you protect the 5-hour and weekly token windows.
This example keeps the scale of a real local run while replacing private names and paths. The point is not the bill. The point is knowing which behavior to change next.
| Session | Provider | Duration | Cache | Tokens | Do next |
|---|---|---|---|---|---|
| Long Context Refactorlargest repeated-context thread | Claude | 6.2d | 99% | 3,310.4M | Summarize and restart |
| UI Polish Sessionlarge ongoing design pass | Claude | 1.1d | 99% | 1,244.8M | Split by screen |
| Codex Repository Sweeplong-running codebase work | Codex | 3.0d | 49% | 928.0M | Narrow file scope |
| Feature Handoff Sessionsingle thread carrying context | Claude | 5.4d | 96% | 709.1M | Checkpoint decisions |
When someone shares a screenshot of a context meter, usage table, budget policy, or agent comparison, the useful answer is a recommendation tied to the metric, the reset window, and the session causing the burn.
Tell the user to pause, ask for a compact state summary, save it to a handoff file, and restart in a clean thread before output quality degrades.
Compare task outcome, input plus cache-write tokens, tool access, and retry count. Recommend the agent with the right connectors, not just the lowest-looking session.
Compare remaining 5-hour and weekly budget against the sessions consuming each window. Pause fast loops, checkpoint the top burner, and continue only with scoped work.
Route broad repo sweeps to Codex, keep MCP-heavy or product-judgment work in the environment with the right tools, then hand off a narrowed summary.
A good token plan is not just a weekly limit. It is knowing what remains in the 5-hour and weekly windows, when to spend reasoning, and when to stop a fast loop.
Use GPT-5.5 for ambiguous planning, tradeoffs, acceptance criteria, and review. Move implementation to GPT-5.3 Codex once files, constraints, and tests are clear.
Fast mode is useful for narrow edits and quick checks. It is expensive when the agent is still discovering scope because it can burn through more turns before you notice.
Give the goal, success criteria, allowed side effects, evidence rules, and output shape. Add step-by-step process only when the path itself matters.
Compare rolling 5-hour and weekly remaining tokens against the sessions consuming them. If a session is eating the window, checkpoint it before starting another broad turn.
The dashboard is useful because it turns usage, screenshots, and team questions into a concrete next move.
If one session dominates the range, stop using it as the everything-chat. Divide the work by feature, file group, or decision.
When cache reads are huge, ask for a compact handoff and continue in a fresh session with only the useful state.
Replace repo-wide prompts with targeted file lists, failing tests, screenshots, or exact error output.
Use Codex for broad codebase sweeps and mechanical edits. Save Claude for architecture, UX, and review.
Keep the screenshot, terminal output, or usage table with the metric so the recommendation answers the actual situation.
Use monthly limits, exception notes, and approval thresholds so heavy users can justify real wins before costs drift.
Use /fast after the write set is known. If the 5-hour remaining number drops quickly, switch back to scoped prompts and measured effort.
Start with low or medium effort for bounded work. Raise effort only when the task is ambiguous and the result is worth the extra burn.
Claude and Codex already keep local usage traces. AI Spend Live turns them, plus supporting screenshots from other tools, into operational signal.
Claude titles and Codex thread names show up first, so you know which real task is burning tokens.
Duration, cache reuse, and token totals reveal sessions that should be closed or summarized.
Largest-turn views catch giant asks before you repeat them all afternoon.
Context meters, usage tables, and spend-policy threads become prompts for what to split, compact, reroute, or cap.
Token totals are weighed against MCP access, repo context, and retry loops so the cheapest-looking path does not hide extra work.
Per-user guardrails and exception reviews can point at the sessions, tasks, and outcomes behind the spend.
The dashboard can turn a spike into outcome-first asks, compact handoffs, scoped file lists, and effort changes.
Selectable Claude and Codex plan presets turn official 5-hour message windows into local token estimates, then show whether one session is burning through the budget too early.
Use frontier planning for uncertainty, then hand Codex a bounded write set, command, and acceptance criteria.
AI Spend Live is meant to be a public tool people can use on their own machines. No hosted account, no uploaded prompts.
Download the public repo wherever Claude Code and Codex are already installed.
Run the local Node server. It reads only your machine's usage logs.
Use the top sessions and turns to decide what to split, summarize, narrow, or reroute.
git clone https://github.com/AnthonyDiPerna/aispend-live.git
cd aispend-live
npm install
$env:AI_SPEND_5H_TOKEN_BUDGET="8000000"
$env:AI_SPEND_WEEKLY_TOKEN_BUDGET="50000000"
npm run dashboard
# open http://127.0.0.1:9020/
Open the GitHub repo to star, fork, or file an issue.
Your prompts should not leave your machine just to explain where the tokens went.
The website promotes the tool. The dashboard runs locally.