Research · Blogs & Independent Thinkers
Back to sweepResearch sweep · deep · 2023 – 2026
Token Cost of Ownership
AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.
- financial
- frontier
- academic
- vc
- blogs
Synthesised 2026-04-19
Narrative
The independent blog and newsletter ecosystem tells three tightly interlocking stories about AI token pricing. The first is subsidisation at scale. ScaleDown (tinyml.substack.com), the most analytically rigorous Substack on inference economics, concluded in 2024 via bottom-up hardware modelling that LLM API providers are absorbing over 90% of the true cost of each token, describing the market as a VC-funded 'land-grab phase' structurally identical to Uber's early price subsidisation. SemiAnalysis (Dylan Patel et al.) provided the forensic hardware backing — from the landmark 2023 'Inference Cost of Search Disruption' piece through the January 2025 DeepSeek Debates — consistently showing that GPU die costs, memory bandwidth constraints, power draw, cooling, networking, and staff combine to make current API prices economically unviable at face value. The GitHub Copilot parallel ($10/month subscription against $30/month compute cost) is the canonical example cited across Substack, Medium, and LessWrong. The second story is LLMflation. Andreessen Horowitz coined the term in November 2024 and quantified a 10x annual cost decline for equivalent-performance inference over three years; the November 2025 arXiv paper 'The Price of Progress' gave this academic weight, documenting a 280x cost drop for GPT-3.5-equivalent performance between late 2022 and October 2024. Epoch AI refined the picture, showing price declines of 9x to 900x per year depending on capability tier — with frontier reasoning models holding price while commodity models collapsed — and projecting training costs crossing $1 billion per run by 2027. The third story, which has dominated 2025–2026 commentary, is the Jevons Paradox: despite token prices falling a thousandfold in three years, enterprise AI inference spend reached $37 billion in 2025, up 320% year-on-year. Multiple Substack writers (AI Proem, The Substrate) documented Meta raising 2025 AI capex by 50% after DeepSeek's efficiency announcement rather than cutting it. Ben Thompson at Stratechery frames this as 'the return of marginal costs' — arguing that AI has broken the near-zero marginal cost model underpinning two decades of tech valuations — while Tiny Empires and Aspiring for Intelligence extend the argument to agentic workflows, showing that multi-step chains multiply token consumption to the point where total cost of ownership bears little relationship to headline API rates. Across all three stories, the consensus is that current prices are strategically rather than economically determined, and that energy, data centre capex ($600B+ forecast for 2026), and hardware costs are building a structural floor that will eventually force a reckoning.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | The Unsustainable Economics of LLM APIs: Understanding the Coming Price Realignment | ScaleDown (tinyml.substack.com, Substack) | 2024 | Bottom-up hardware cost analysis concluding that LLM API providers absorb over 90% of true token costs, framing the current market as a VC-funded 'land-grab phase' structurally analogous to Uber's early subsidised pricing. |
| b2 | The Cost of Inference: Running the Models | ScaleDown (tinyml.substack.com, Substack) | 2024 | Practitioner-level breakdown of GPU, energy, networking, cooling, and ops overhead that compose the true cost of serving a token, providing the most granular independent infrastructure accounting framework available publicly. |
| b3 | Tokenomics 101: Navigating the Nuances of LLM Product Pricing | ScaleDown (tinyml.substack.com, Substack) | 2024 | Explains why input/output token price ratios reflect compute and memory bandwidth constraints rather than usage patterns, and quantifies how published API rates relate to underlying unit economics. |
| b4 | The Economics of Building ML Products in the LLM Era | ScaleDown (tinyml.substack.com, Substack) | 2024 | Examines the total cost of ownership for product builders layering on top of frontier APIs, showing how token costs compound through retrieval, context, and agentic chains to produce effective per-query costs far above headline rates. |
| b5 | The Price of Tokenmaxxing | Aspiring for Intelligence (Substack) | 2025 | Analyses how Anthropic's API pricing at scale challenges the foundational startup-layer assumption that foundation model costs would remain negligible, arguing the 'cheap token' era is ending for heavy agentic workloads. |
| b6 | The Price Is Wrong | Aspiring for Intelligence (Substack) | 2025 | Investigates the structural gap between flat-subscription pricing and per-token API rates, arguing Anthropic was cross-subsidising heavy agentic users by more than 5x, a dynamic now forcing explicit pricing architecture decisions. |
| b7 | Groq Inference Tokenomics: Speed, But At What Cost? | SemiAnalysis (newsletter.semianalysis.com, Substack) | 2024-02 | First-principles cost modelling of Groq's LPU architecture against H100 economics, establishing the benchmark methodology for comparing true cost-per-token across inference hardware generations. |
| b8 | Inference Race To The Bottom - Make It Up On Volume? | SemiAnalysis (newsletter.semianalysis.com, Substack) | 2024 | Directly addresses whether commodity token prices can persist below true cost at scale, arguing aggressive price competition is structurally unsustainable without volume offsets that current demand does not yet guarantee. |
| b9 | The Inference Cost Of Search Disruption – Large Language Model Cost Analysis | SemiAnalysis (newsletter.semianalysis.com, Substack) | 2023 | Landmark early analysis estimating what deploying GPT-4-class inference at Google Search scale would cost, establishing a cost-floor analysis that anchored subsequent independent discussion of the scale of lab subsidisation. |
| b10 | DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts | SemiAnalysis (newsletter.semianalysis.com, Substack) | 2025-01 | Forensic reconstruction of DeepSeek's true training compute costs and the implications for Western lab margins, directly testing how much of the apparent cost advantage is real versus accounting artefact. |
| b11 | AMD vs NVIDIA Inference Benchmark: Who Wins? - Performance & Cost Per Million Tokens | SemiAnalysis (newsletter.semianalysis.com, Substack) | 2025-05 | Six-month empirical benchmark comparing hardware cost-per-token across real workloads, revealing a 15x cost reduction from Hopper to Blackwell generation and nuanced workload-specific GPU advantage patterns. |
| b12 | InferenceMAX™: Open Source Inference Benchmarking | SemiAnalysis (newsletter.semianalysis.com, Substack) | 2025-10 | Introduces an independent TCO-per-million-token benchmark — the first to measure total cost of compute across diverse model sizes and real-world scenarios — establishing a replicable methodology for ongoing cost trajectory analysis. |
| b13 | Mythos, Muse, and the Opportunity Cost of Compute | Stratechery (Ben Thompson) | 2026 | Argues that AI has re-introduced meaningful marginal costs into tech after two decades of near-zero marginal cost software, with direct implications for why token prices have a structural floor and why current pricing is strategically rather than economically motivated. |
| b14 | AI Promise and Chip Precariousness | Stratechery (Ben Thompson) | 2025 | Examines how DeepSeek and open-weight models create persistent structural pricing pressure, arguing that sustainable margins require either a hardware cost advantage (Google's TPU edge) or aggregation, not just capability differentiation. |
| b15 | Rapidus, The End of Economic Rationality, AI Disruption | Stratechery (Ben Thompson) | 2024 | Argues that AI capex commitments have entered a regime where strategic imperatives suspend economic rationality, contextualising why labs sustain large operating losses to hold developer market position. |
| b16 | Observations About LLM Inference Pricing | LessWrong | 2024 | Empirical analysis showing 10x price dispersion for identical open-weight models across providers, inferring that software stack optimisation (batching, kernel efficiency, speculative decoding) drives more of the actual cost variance than hardware alone. |
| b17 | Simon Willison on llm-pricing (tag archive) | Simon Willison's Weblog | 2023 | Running empirical record of every major LLM pricing event from GPT-4's launch through 2026, with practitioner cost benchmarks (e.g. captioning 68,000 images for $1.68 with Gemini Flash) documenting the ~150x price drop with concrete real-world examples. |
| b18 | Welcome to LLMflation — LLM inference cost is going down fast | Andreessen Horowitz (a16z) | 2024-11 | Coins 'LLMflation' and quantifies a 10x annual cost decline for equivalent-performance inference over three years — from $60/M tokens in 2021 to $0.06/M by late 2024 — the most-cited single data point in independent discourse on the price-collapse rate. |
| b19 | How persistent is the inference cost burden? | Epoch AI (Substack) | 2025 | Analyses whether inference costs as a share of lab revenues are structural or transitional, estimating OpenAI's 2024 inference compute spend and modelling future cost burden under different algorithmic efficiency trajectories. |
| b20 | How much does it cost to train frontier AI models? | Epoch AI | 2024 | Quantifies that frontier model training costs are growing 2–3x per year and projects the largest runs crossing $1 billion by 2027, directly addressing how training capex amortises into per-token inference pricing and why apparent API prices understate true costs. |
| b21 | LLM inference prices have fallen rapidly but unequally across tasks | Epoch AI | 2025 | Demonstrates that inference price decline rates range from 9x to 900x per year depending on capability tier, with frontier reasoning models holding price stable while commodity models collapsed — the key bifurcation story of 2024–2025. |
| b22 | The Jevons Paradox in AI Infrastructure: DeepSeek Efficiency Breakthroughs to Drive Energy Demand | AI Proem (Substack) | 2025 | Applies Jevons Paradox to argue that DeepSeek-style efficiency gains will expand total AI compute demand and energy consumption rather than reduce them, establishing a rising infrastructure cost floor that will eventually pressure token prices upward. |
| b23 | The Jevons Paradox in AI: Why Efficiency Creates More Demand | The Substrate (Substack) | 2025 | Documents that per-token prices fell a thousandfold in three years yet total enterprise AI spending surged 320% in 2025, with enterprise inference spend reaching $37B — empirically confirming that Jevons effects dominate price reductions at the market level. |
| b24 | AI agents are about to get more expensive | Tiny Empires (Substack) | 2025 | Argues that multi-step agentic workflows break the 'cheap token' assumption by multiplying token consumption multiplicatively, making true total cost of ownership for agentic AI materially higher than per-token sticker prices suggest. |
| b25 | AI Pricing Architecture Is Now Strategy | SaaS Intelligence (Substack) | 2025 | Frames token-based API pricing as a strategic weapon for developer lock-in and market share capture, drawing direct parallels to early AWS subsidised cloud pricing as a land-grab before margin normalisation. |