The Token Behind the Invoice: How AI Pricing Is Rewriting the Agency-Client Relationship

Your agency sent you an invoice last month. There’s a number that’s not on it.

It sits between the work they delivered and the amount you paid. It’s been there for over a year. It’s why the next conversation with your agency is going to feel different, whether you start it or they do.

That number is called a token. It’s the foundation of AI token pricing. Once you understand what it means, you’ll never read an agency invoice the same way again.

First, what actually is a token?

Not in the abstract. In plain terms.

Every task the AI performs consumes tokens. Reading your brief, writing a headline, rewriting a paragraph — all of it. Think of a token as roughly three-quarters of a word. Input tokens are what goes in. Output tokens are what comes out. Both are metered. Both are billed.

When you use ChatGPT on a subscription, you don’t see any of this. The cost is buried inside a flat monthly fee. But when agencies access these models through an API, the meter runs. Every task. Every asset. Every revision.

The question is: who’s paying for it, and are you being told?

The Margin Gap AI Token Pricing Is Creating

Here’s the number that changes everything about AI token pricing for agencies. In fact, it reframes the entire agency economics conversation.

That deliverable now takes two hours of oversight and roughly $200 in API tokens. It used to take 100 hours of human work. AI-native agencies, lean teams accessing models directly rather than through packaged tools, operate at 94–96% margins per asset. Traditional agencies sit at 13–18%.

That’s not a rounding error. That’s a different business entirely.

WPP fell out of the FTSE 100 in December 2025 after nearly 30 years of membership. Share price dropped roughly two-thirds. Revenue fell 3.6%.

Ogilvy eliminated 700 employees. Omnicom cut 3,000. IPG laid off 3,200 in nine months.

Meanwhile, Publicis hit a record 18.2% operating margin. 73% of its business model now runs through its AI infrastructure, CoreAI.

One firm rebuilt its cost structure around token economics. The others didn’t. The results are now visible on every quarterly earnings call.

Supernatural AI has 30 people and is winning national campaigns. YC’s Spring 2026 Request for Startups explicitly called for agencies that operate with software margins. That call is being answered by firms most clients have never heard of. They are competing for briefs that used to go to established agencies by default.

Seven AI Agency Pricing Models. No Standard. No Disclosure.

A Digiday investigation in March 2026 found at least seven distinct AI token pricing models operating simultaneously. There is no industry consensus on how agencies charge for AI work.

Seven AI agency pricing models ranked by client alignment — from flat retainer to outcome-based pricing

Here’s what they actually are:

Generation credits. Pencil uses this. The client commits to a volume of generations upfront. The more they commit, the cheaper the unit cost. Agencies use volume to negotiate better rates with model providers. Incentives are at least partially aligned.

Cost-recovery billing. A nominal fee on top of the retainer. Token costs are passed through at near-actual API rates with a handling margin. Transparent in theory. Impossible to verify in practice unless you know what the API actually costs.

Metered billing. Merge does this. Usage tracked, invoiced monthly at API rates plus margin. The most honest model. Also the one most likely to produce a surprise invoice when a campaign runs heavy.

Cost absorption. The agency swallows the token costs into the retainer. Sounds generous. Creates the opposite incentive: the agency minimises AI usage to protect margin, rather than maximising it to improve output quality.

Flat retainer. The legacy model, retrofitted. Fixed monthly fee regardless of AI usage. When AI multiplies output by 10x, this model benefits the agency. Not the client.

Per-output pricing. Charged per piece of content produced. Forces efficiency on the agency side. Obscures quality differences. A researched long-form piece and a generated social post carry the same price tag.

Outcome-based pricing. What most clients want. What most agencies resist. Pay for pipeline generated, leads qualified, revenue influenced. Requires attribution infrastructure that most agencies have not built.

So why is there no standard? It isn’t because the industry is evolving. Transparency around AI token economics reveals exactly how much labour has been replaced. It shows how much of your retainer is now margin, not effort.

It’s not just an agency conversation. This is where AI pricing is heading.

Here’s the thing most people are missing.

The AI token pricing agency question isn’t just between you and your agency. It’s a structural question about where AI economics are heading. The signal is already visible in how everyday developer tools are changing their billing.

Cursor is the AI code editor used by hundreds of thousands of developers. It moved from a request-based pricing model to a usage-based credit system priced at underlying LLM API rates. What felt like $20 a month became, for some users, $20 to $30 a day. One team of five spent over $4,600 in six weeks. Double their entire 2025 spending. A Hacker News commenter reported “$350 in Cursor overage in a week.” Cursor had to publish a blog post called “Clarifying our pricing” and issue refunds.

This is what happens when flat-rate packaging meets metered compute. The gap closes. Users pay the difference.

The Anthropic situation this week makes the same point from a different angle. Anthropic adjusted its five-hour session limits during peak hours to manage growing demand. About 7% of users, particularly on Pro tiers, now hit session limits they wouldn’t have hit before. This came days after Claude reached 11.3 million daily active users, up from 4 million in January. The platform tripled its user base in two months, and the infrastructure strained visibly under the load. Anthropic also blocked third-party tools impersonating its official client to shut down subscription limit bypasses. The unlimited AI era is ending faster than anyone expected.

Ultimately, the underlying driver is cost. Rising infrastructure costs are forcing AI providers to harden controls through daily prompt caps, weekly reset windows, and model downgrades. The industry is moving toward explicit access rationing.

And here is the structural force underneath all of it. OpenAI burned through $8 billion against $13 billion in revenue in 2025. It projects $14 billion in losses in 2026. Sam Altman has publicly acknowledged that OpenAI loses money on its $200-per-month ChatGPT Pro subscriptions because usage far exceeded projections.

The era of all-you-can-eat AI pricing is not sustainable. Usage-based billing is the direction this is heading. Every token is metered. Every token is charged at its true cost. The only question is how fast.

What this means if you’re a founder using an agency

Three questions to ask in your next contract negotiation about AI token pricing. Not requests. Questions.

Which models are you using for my work, and at what cost? Not whether they use AI. Everyone does now. Which models specifically. Claude Sonnet costs differently than a reasoning model. The premium models can cost 10x more per token. You are entitled to know what infrastructure your budget is running on.

What is the AI agency pricing model for my account? Is the cost absorbed? Passed through? Metered? Flat? Ask to see the clause in the contract. If there is no clause, that is your answer.

Can you show me the compute cost for this campaign? Not the full API bill. The marginal cost of the specific outputs you commissioned. If an agency cannot produce this number, they either don’t track it or don’t want you to see it. Both tell you something.

The baseline shift to push for: output-based or outcome-based pricing. Pay for what gets produced or what gets converted. Not for hours billed against a tool subscription you’re funding without knowing it.

Where this leaves the client who isn’t asking these questions

Right now, AI token pricing for agencies and their clients is generous. The major providers are subsidising usage to capture market share. Three things are telling you this at once: the Cursor pricing shock, the Anthropic usage rationing, and OpenAI’s $14 billion projected losses.

The companies that build token governance into their operations today are building the muscle they’ll need when prices normalise. They will know what they’re consuming, what it costs, and what it’s producing.

The companies that don’t are funding someone else’s infrastructure and calling it a retainer.

Simply put: the window of cheap, subsidised AI is closing.

One more thing

Understanding AI token pricing for agencies is, at its core, a question about leverage. Tokens can generate content at near-zero marginal cost. They can fill a blog, populate a newsletter, produce fifty versions of a landing page headline. What they cannot do is build the trust that compounds when a specific voice says something specific, consistently, over time.

That’s the argument in my forthcoming book, Content-Led Brand. The most defensible asset in a world of free content production is a brand people seek out rather than scroll past. The founders who understand this now are not competing on volume. They’re competing on voice. That’s what building a content-led brand is really about.

The token economy makes that bet more valuable, not less.

Frequently Asked Questions

What is AI token pricing for agencies?

AI token pricing is how agencies pay to use large language models via an API. Every task an AI model performs — reading a brief, generating copy, summarising a document — consumes tokens, roughly three-quarters of a word each. Input tokens and output tokens are both metered and billed at a rate per million tokens. When agencies access models this way, cost is directly tied to usage. Unlike a flat subscription, where it’s hidden inside a fixed fee, token-based billing makes the true cost of AI work visible, which is exactly why most agencies don’t show it to clients

How do agencies charge clients for AI token costs?

There is no industry standard. Digiday’s March 2026 investigation found at least seven distinct AI agency pricing models operating simultaneously: generation credits, cost-recovery billing, metered billing, cost absorption into the retainer, flat retainers, per-output pricing, and outcome-based pricing. Some agencies absorb the cost entirely; others pass it through at a markup; others haven’t disclosed their model to clients at all. Asking directly and getting the answer in the contract is the only reliable way to know.

Are AI token prices going up or down?

Token prices have been falling on paper — but the real cost per workflow is rising. Models are getting more capable but also more token-intensive, especially for agentic tasks involving long context windows and multi-step reasoning. Cursor users saw effective price increases of 20x or more when the platform switched from request-based to token-based billing. OpenAI is projecting $14 billion in losses in 2026, a sign that current pricing is subsidised. The generous window most agencies and clients are operating in right now is temporary.

What questions should I ask my agency about AI token costs?

Three: Which AI models are you using for my work, and at what per-token cost? What is the pricing model: absorbed, metered, passed through, or flat? And can you show me the compute cost for this specific campaign? If they can’t answer the third question, they either aren’t tracking it or don’t want you to see it. Both are useful pieces of information.

Build a content system, not just content.

I work with leadership teams to design content systems that create sustainable competitive advantages — not campaigns that expire.

See how I work →

Table of Contents

Build a content system, not just content.

I work with leadership teams to design content systems that create sustainable competitive advantages.

Work with me →