Back to Blog
strategy

AI pricing in 2026: why per-token prices fell but bills are rising

By Jay MatharuPublished Last reviewed

What changed

The price of AI has been falling fast, and at the same time many businesses are paying more for it. Both are true. Industry pricing trackers put the average cost per million tokens down roughly 60 to 80 percent since early 2025. Anthropic is a clear example: Opus input pricing dropped from about 15 USD per million tokens on Opus 4.1 to about 5 USD on Opus 4.6, a cut of around two-thirds. In March 2026 Anthropic also removed the long-context surcharge on Opus 4.6 and Sonnet 4.6, so prompts approaching the 1-million-token limit are now billed at the standard per-token rate, rather than the premium band that previously applied once a prompt passed roughly 200,000 tokens. Current Anthropic list rates are about 5 USD input and 25 USD output per million tokens for Opus 4.6, and about 3 USD input and 15 USD output for Sonnet 4.6.

Why it matters for UK business

The headline "AI is getting cheaper" is correct per unit and misleading per invoice. Unit prices are falling, but the way businesses use AI is changing faster than the prices are. Agentic workflows, where a model plans, calls tools and iterates across many steps, consume many times more tokens than a single question-and-answer call. Firms that set budgets on 2024 token rates and simple chatbot usage are now finding that 2026 agentic usage costs a multiple of the original spreadsheet.

For a UK SME, there are three practical consequences:

  • Per-token list price is not your cost. Your cost is price multiplied by tokens consumed, and tokens consumed is the part that grows quietly. A cheaper model used carelessly can cost more than a dearer model used well.
  • The cheapest tiers are genuinely cheap. Small, fast models now sit well under 1 USD per million input tokens. A large share of routine business tasks can run there rather than on a top-tier model.
  • AI API spend is denominated in USD. For a GBP-reporting business, the effective cost also moves with the exchange rate, a small but real planning factor that a USD price list hides.

What to do, and what not to do

Do:

  • Match the model tier to the task. Reserve top-tier models for genuinely hard reasoning and agentic work; route routine drafting, classification and summarisation to cheaper tiers.
  • Use prompt caching and batch processing where the workload allows. Anthropic prices caching at up to a 90 percent saving and batch at around 50 percent.
  • Instrument usage before you scale it. Track tokens per task and cost per outcome, not just the monthly total, so you can see which workflow drives the spend.
  • Set usage caps and billing alerts on the API account before rolling a workflow out to the whole team.

Do not:

  • Assume a price cut lowers your bill. It lowers the unit price; your bill depends on volume, which you control.
  • Choose a model on list price alone. A model that needs fewer attempts or shorter prompts to reach the right answer can be cheaper at a higher headline rate.
  • Let an agentic proof of concept reach production without a cost-per-run estimate. That is where the surprise invoices come from.

Where The AI Consultancy fits

Working out the real, loaded cost of an AI workflow, and the cheapest tier that still does the job, is part of our AI Readiness work and is reflected in how we scope and price engagements. Our ROI calculator and pricing page set out the approach. If you have had an AI bill that was larger than expected, or you are budgeting for an agentic workflow and want a realistic per-run figure before you commit, that is a scoped piece of analysis rather than a guess.

Figures are from Anthropic's published pricing and from industry pricing trackers as at 30 May 2026 and are quoted in USD. The per-token decline range is a third-party estimate and varies by model and methodology.

Frequently asked questions

Is AI getting cheaper or more expensive in 2026?
Both, depending on how you measure it. The price per million tokens has fallen sharply since 2025, but total bills are often rising because businesses, especially those using agentic workflows, consume far more tokens than they used to. Unit price is down; usage is up.
How can a small UK business keep AI costs under control?
Match the model tier to the task rather than defaulting to a top-tier model, use prompt caching and batch processing where possible, set usage caps and billing alerts, and measure cost per outcome before scaling a workflow to the whole team.

Related Articles

strategy

What is a Fractional Chief AI Officer? A UK Guide for 2026

strategy

Fractional CAIO Cost in the UK 2026: Day Rates, Monthly Retainers, Real Examples

strategy

Fractional CAIO vs AI Consultant: Which Does Your UK Business Need?

Ready to explore AI for your business?

Book a free 20-minute consultation. No obligation, no jargon.