Introduction
Many companies still assess AI cost with the wrong question: “how much is the subscription?”. For an internal tool, a SaaS product or an automation, the real question is different: how much do tokens, outputs, tools, cache, searches, failed calls and validations cost?
A subscription may look clear. An API bill can become confusing. Costs vary depending on the model, context size, output volume, cache availability, tools called and sometimes the processing region.
This article explains the basics for managing an AI budget without unnecessary jargon. It does not provide the real cost of WG products, because that cost must be proven by invoices, logs and consolidated live usage before any public announcement.
A token is not a word
A token is a technical unit used by models to read and generate text. It is not exactly a word. Depending on language, punctuation, code and characters, the number of tokens can vary.
To manage a budget, you need to separate three things:
| Element | What it means | Budget impact |
|---|---|---|
| Input tokens | What you send to the model | Large prompts, files, history, tools |
| Output tokens | What the model generates | Long answers, articles, code, reports |
| Tool tokens | Tool definitions, results, search, files | Can grow quickly with agents |
The common trap is to look only at the user prompt. In a real workflow, the model also receives system instructions, project context, schemas, tools, search results and sometimes files or code.
Why output can cost more
Across several pricing grids reviewed, output tokens can cost more than input tokens depending on the provider and model. Prices change regularly, so the official pricing page must be checked on the day a budget is prepared.
The business principle remains useful even without freezing a number: a long output can cost significantly more than a short input when it is generated often.
Examples of outputs that can inflate costs:
- Long articles in several variants.
- Detailed audit reports.
- Generated code with long explanations.
- Agents that summarize every action instead of producing short proof.
- Workflows that restart several times instead of correcting one step.
Cache can help, but it is not a magic wand
OpenAI documents prompt caching as an automatic mechanism on eligible prompts, with possible latency and cost benefits when identical prefixes are reused. The documentation also indicates that prompts must reach a minimum threshold and that static content should be placed at the beginning to maximize cache hits.
Anthropic also documents prompt caching with cache write and cache read logic, and durations such as 5 minutes or 1 hour depending on configuration and pricing.
The practical conclusion:
- Cache mainly helps when the same stable context is reused.
- It works better when stable instructions and documents are placed at the beginning.
- It does not fix a poorly designed workflow.
- It does not automatically reduce all costs.
Internal mini-calculator
To estimate an AI workflow, use a simple table with these fields.
| Field | Internal example | How to use it |
|---|---|---|
| Model | To fill in | Input/output pricing to recheck |
| Average input | 20,000 tokens | Prompt + context + tools |
| Average output | 3,000 tokens | Final answer or report |
| Cache rate | 0%, 50%, 80% | Hypothesis to prove through usage |
| Number of calls | 1,000/month | Real volume or scenario |
| Tool cost | Web search, file search, shell, etc. | Depends on provider |
| Total cost | Calculated | Do not publish without proof |
This calculator should not display any official cost until live data has been consolidated.
Errors that make the bill explode
- Sending all context on every request without separating stable and variable parts.
- Asking for long answers by default.
- Letting an agent multiply searches without a retrieval budget.
- Using a model that is too strong for a simple task.
- Not logging
input_tokens,output_tokensand cached tokens. - Confusing a one-off test with monthly cost at real volume.
The solution is not always to choose the cheapest model. The solution is to choose the right model level, the right context, the right output format and the right verification.
WG control method
A clean AI workflow should store at least:
- The model used.
- The number of calls.
- Input/output tokens.
- Cached tokens when available.
- Tools called.
- Failure or regeneration rate.
- The business value of the result.
Without this base, you are not managing an AI system. You are consuming a black box.
Verified official sources
Sources rechecked on 2026-05-20 before publication.
- https://developers.openai.com/api/docs/pricing
- https://developers.openai.com/api/docs/guides/prompt-caching
- https://platform.claude.com/docs/en/about-claude/pricing
- https://platform.claude.com/docs/en/build-with-claude/prompt-caching
Note: AI prices, model names and features can change. Official sources must be rechecked before any budget or technical decision.