Blog Post: The Tokenmaxxing Tax

6 May

What Meta’s leaderboard has exposed and why governance, not gamification, is the only fix.

The Pragmatic Engineer’s reporting on Meta’s internal “Claudeonomics” leaderboard was the most quietly damning piece of AI coverage this year. Eighty-five thousand employees. Sixty trillion tokens in thirty days. A list price that would have cleared $900M, a real bill likely to be north of $100M. Internal accounts of throwaway prototypes, redundant agent runs, deliberately inflated prompts, and SEVs traced back to AI-generated code no one bothered to read.

Meta took the leaderboard down after the story broke. Microsoft, Salesforce and others still run versions of the same idea, minimum spend targets, public dashboards, “use more or be flagged.” The intent, on paper, is to push adoption. The outcome, in practice, is a familiar pattern; when you measure the wrong thing, people optimise for the wrong thing.

This is the lines-of-code metric, reborn. And the tax it carries is bigger than the invoice.

The bill hidden inside the bill

Industry data already puts the structural cost of ungoverned inference at 20 - 40% margin compression and 30 - 60% budget volatility. Tokenmaxxing turns that volatility into something worse, a culture where wasted spend is the goal.

The waste isn’t only financial. Every token sent to a premium model is compute that must be provisioned, powered, cooled and run. Less tokens = less compute = less emissions. That equation runs in both directions. Meta’s 60.2 trillion tokens in a single month did not exist as line items on a sustainability report, but they existed as load on data centres drawing power, drawing water and drawing carbon. The Pragmatic Engineer piece is an excellent diagnosis of the cost story. By omission, it is also the largest unpriced environmental story in enterprise software right now. In real world terms, burning through 60 trillion AI tokens in a month could consume around 120 GWh of electricity and create about 58,000 tonnes of CO₂, comparable to tens of thousands of return transatlantic passenger flights*.

Gamification doesn’t fix governance gaps

Shopify’s response of renaming the leaderboard as a usage dashboard, adding circuit breakers for runaway agents, and interrogating why expensive token usage is expensive reflects the instincts of a mature engineering organisation. It treats AI inference like any other production system, something that needs observability, guardrails, and ongoing operational review.

But circuit breakers and rename exercises are tactical patches. The structural problem is that AI inference is the only piece of modern infrastructure most enterprises run without a dedicated governance layer.

Cloud had the same gap once. Then FinOps emerged and gave Finance and Engineering a shared language, shared visibility and shared control. AI inference is at the same inflection point, and the gap is widening every month adoption compounds.

The infrastructure answer

cortave is the AI cost governance layer for enterprise. It sits between AI applications and the LLMs, and it does three things no incentive scheme can.

It applies policy-controlled routing. Most AI systems send every request to a premium model by default. cortave routes simple requests to lower-cost models automatically, reserving expensive inference for the work that truly requires it. The decision is enforced in infrastructure, not left to a developer racing a leaderboard.

It enforces efficiency guardrails on every call. Bloated prompts, redundant context, repeat queries that should hit cache are handled at the layer, before they hit the model. Developers keep their autonomy. The token bill stops absorbing it.

It attributes spend to outcomes. Token usage gets tied to the workflow, the team and the result. Finance sees unit economics. Engineering sees what’s really being built. The board sees AI investment that can be measured, attributed and reported. Not estimated.

The result, in deployments to date shows token spend is reduced by between 50 - 80%. In workflow-heavy environments chained agents, validation loops, repeated summarisation, exactly the patterns leaderboard cultures produce its over 90%.

What changes for each seat at the table

For the CTO, it means routing, observability, security and cost control in one infrastructure layer not five-point tools and a Slack channel of escalations.

For the CFO, it means gross profit protected as usage scales. Inference becomes a line item with predictable unit economics, not a quarterly surprise.

For the Chief Sustainability Officer, it means the carbon and energy cost of AI usage stops being an externality the organisation can’t see. Less tokens = less compute = less emissions. The reduction is real, and it’s enforced at the layer where the decision gets made.

For the board, it means AI investment that can be defended on margin, on risk and on stewardship not on usage metrics that turn out to incentivise the opposite of value.

Stop measuring tokens. Start governing them.

The lesson from Meta isn’t that AI adoption is the wrong bet. It’s that adoption without governance compounds the wrong behaviour at exactly the speed AI scales which is fast.

Tokenmaxxing is what happens when an organisation has no infrastructure-level answer to the question “how much should this cost, and to whom?” Leaderboards, minimum spend targets and public dashboards of peer usage are not that answer. They’re what gets built when the answer is missing.

cortave is the answer. The same strategic position FinOps occupies for cloud applied to AI inference, with the savings, the controls and the environmental upside built into the layer.

AI costs too much to run without control. cortave fixes that.

See it on your own deployment.Book a 20-minute walkthrough here and we’ll model the token spend and show the savings you could make.

* Processing 60 trillion AI tokens per month could consume roughly 120 GWh of electricity and generate around 58,000 tonnes of CO₂ per month, assuming 2 Wh per 1,000 tokens and an average grid intensity of 480g CO₂/kWh.

Melissa Cowdry

Blog Post: The Tokenmaxxing Tax

Blog Post: Your AI Budget Has a Leak

Podcast: the cost of intelligence