Research · Blogs & Independent Thinkers

Back to sweep

Research sweep · deep · 2023 – 2026

Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

  • financial
  • frontier
  • academic
  • vc
  • blogs

Synthesised 2026-04-19

Narrative

The independent blog and newsletter ecosystem tells three tightly interlocking stories about AI token pricing. The first is subsidisation at scale. ScaleDown (tinyml.substack.com), the most analytically rigorous Substack on inference economics, concluded in 2024 via bottom-up hardware modelling that LLM API providers are absorbing over 90% of the true cost of each token, describing the market as a VC-funded 'land-grab phase' structurally identical to Uber's early price subsidisation. SemiAnalysis (Dylan Patel et al.) provided the forensic hardware backing — from the landmark 2023 'Inference Cost of Search Disruption' piece through the January 2025 DeepSeek Debates — consistently showing that GPU die costs, memory bandwidth constraints, power draw, cooling, networking, and staff combine to make current API prices economically unviable at face value. The GitHub Copilot parallel ($10/month subscription against $30/month compute cost) is the canonical example cited across Substack, Medium, and LessWrong. The second story is LLMflation. Andreessen Horowitz coined the term in November 2024 and quantified a 10x annual cost decline for equivalent-performance inference over three years; the November 2025 arXiv paper 'The Price of Progress' gave this academic weight, documenting a 280x cost drop for GPT-3.5-equivalent performance between late 2022 and October 2024. Epoch AI refined the picture, showing price declines of 9x to 900x per year depending on capability tier — with frontier reasoning models holding price while commodity models collapsed — and projecting training costs crossing $1 billion per run by 2027. The third story, which has dominated 2025–2026 commentary, is the Jevons Paradox: despite token prices falling a thousandfold in three years, enterprise AI inference spend reached $37 billion in 2025, up 320% year-on-year. Multiple Substack writers (AI Proem, The Substrate) documented Meta raising 2025 AI capex by 50% after DeepSeek's efficiency announcement rather than cutting it. Ben Thompson at Stratechery frames this as 'the return of marginal costs' — arguing that AI has broken the near-zero marginal cost model underpinning two decades of tech valuations — while Tiny Empires and Aspiring for Intelligence extend the argument to agentic workflows, showing that multi-step chains multiply token consumption to the point where total cost of ownership bears little relationship to headline API rates. Across all three stories, the consensus is that current prices are strategically rather than economically determined, and that energy, data centre capex ($600B+ forecast for 2026), and hardware costs are building a structural floor that will eventually force a reckoning.


Sources

ID Title Outlet Date Significance
b1 The Unsustainable Economics of LLM APIs: Understanding the Coming Price Realignment ScaleDown (tinyml.substack.com, Substack) 2024 Bottom-up hardware cost analysis concluding that LLM API providers absorb over 90% of true token costs, framing the current market as a VC-funded 'land-grab phase' structurally analogous to Uber's early subsidised pricing.
b2 The Cost of Inference: Running the Models ScaleDown (tinyml.substack.com, Substack) 2024 Practitioner-level breakdown of GPU, energy, networking, cooling, and ops overhead that compose the true cost of serving a token, providing the most granular independent infrastructure accounting framework available publicly.
b3 Tokenomics 101: Navigating the Nuances of LLM Product Pricing ScaleDown (tinyml.substack.com, Substack) 2024 Explains why input/output token price ratios reflect compute and memory bandwidth constraints rather than usage patterns, and quantifies how published API rates relate to underlying unit economics.
b4 The Economics of Building ML Products in the LLM Era ScaleDown (tinyml.substack.com, Substack) 2024 Examines the total cost of ownership for product builders layering on top of frontier APIs, showing how token costs compound through retrieval, context, and agentic chains to produce effective per-query costs far above headline rates.
b5 The Price of Tokenmaxxing Aspiring for Intelligence (Substack) 2025 Analyses how Anthropic's API pricing at scale challenges the foundational startup-layer assumption that foundation model costs would remain negligible, arguing the 'cheap token' era is ending for heavy agentic workloads.
b6 The Price Is Wrong Aspiring for Intelligence (Substack) 2025 Investigates the structural gap between flat-subscription pricing and per-token API rates, arguing Anthropic was cross-subsidising heavy agentic users by more than 5x, a dynamic now forcing explicit pricing architecture decisions.
b7 Groq Inference Tokenomics: Speed, But At What Cost? SemiAnalysis (newsletter.semianalysis.com, Substack) 2024-02 First-principles cost modelling of Groq's LPU architecture against H100 economics, establishing the benchmark methodology for comparing true cost-per-token across inference hardware generations.
b8 Inference Race To The Bottom - Make It Up On Volume? SemiAnalysis (newsletter.semianalysis.com, Substack) 2024 Directly addresses whether commodity token prices can persist below true cost at scale, arguing aggressive price competition is structurally unsustainable without volume offsets that current demand does not yet guarantee.
b9 The Inference Cost Of Search Disruption – Large Language Model Cost Analysis SemiAnalysis (newsletter.semianalysis.com, Substack) 2023 Landmark early analysis estimating what deploying GPT-4-class inference at Google Search scale would cost, establishing a cost-floor analysis that anchored subsequent independent discussion of the scale of lab subsidisation.
b10 DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts SemiAnalysis (newsletter.semianalysis.com, Substack) 2025-01 Forensic reconstruction of DeepSeek's true training compute costs and the implications for Western lab margins, directly testing how much of the apparent cost advantage is real versus accounting artefact.
b11 AMD vs NVIDIA Inference Benchmark: Who Wins? - Performance & Cost Per Million Tokens SemiAnalysis (newsletter.semianalysis.com, Substack) 2025-05 Six-month empirical benchmark comparing hardware cost-per-token across real workloads, revealing a 15x cost reduction from Hopper to Blackwell generation and nuanced workload-specific GPU advantage patterns.
b12 InferenceMAX™: Open Source Inference Benchmarking SemiAnalysis (newsletter.semianalysis.com, Substack) 2025-10 Introduces an independent TCO-per-million-token benchmark — the first to measure total cost of compute across diverse model sizes and real-world scenarios — establishing a replicable methodology for ongoing cost trajectory analysis.
b13 Mythos, Muse, and the Opportunity Cost of Compute Stratechery (Ben Thompson) 2026 Argues that AI has re-introduced meaningful marginal costs into tech after two decades of near-zero marginal cost software, with direct implications for why token prices have a structural floor and why current pricing is strategically rather than economically motivated.
b14 AI Promise and Chip Precariousness Stratechery (Ben Thompson) 2025 Examines how DeepSeek and open-weight models create persistent structural pricing pressure, arguing that sustainable margins require either a hardware cost advantage (Google's TPU edge) or aggregation, not just capability differentiation.
b15 Rapidus, The End of Economic Rationality, AI Disruption Stratechery (Ben Thompson) 2024 Argues that AI capex commitments have entered a regime where strategic imperatives suspend economic rationality, contextualising why labs sustain large operating losses to hold developer market position.
b16 Observations About LLM Inference Pricing LessWrong 2024 Empirical analysis showing 10x price dispersion for identical open-weight models across providers, inferring that software stack optimisation (batching, kernel efficiency, speculative decoding) drives more of the actual cost variance than hardware alone.
b17 Simon Willison on llm-pricing (tag archive) Simon Willison's Weblog 2023 Running empirical record of every major LLM pricing event from GPT-4's launch through 2026, with practitioner cost benchmarks (e.g. captioning 68,000 images for $1.68 with Gemini Flash) documenting the ~150x price drop with concrete real-world examples.
b18 Welcome to LLMflation — LLM inference cost is going down fast Andreessen Horowitz (a16z) 2024-11 Coins 'LLMflation' and quantifies a 10x annual cost decline for equivalent-performance inference over three years — from $60/M tokens in 2021 to $0.06/M by late 2024 — the most-cited single data point in independent discourse on the price-collapse rate.
b19 How persistent is the inference cost burden? Epoch AI (Substack) 2025 Analyses whether inference costs as a share of lab revenues are structural or transitional, estimating OpenAI's 2024 inference compute spend and modelling future cost burden under different algorithmic efficiency trajectories.
b20 How much does it cost to train frontier AI models? Epoch AI 2024 Quantifies that frontier model training costs are growing 2–3x per year and projects the largest runs crossing $1 billion by 2027, directly addressing how training capex amortises into per-token inference pricing and why apparent API prices understate true costs.
b21 LLM inference prices have fallen rapidly but unequally across tasks Epoch AI 2025 Demonstrates that inference price decline rates range from 9x to 900x per year depending on capability tier, with frontier reasoning models holding price stable while commodity models collapsed — the key bifurcation story of 2024–2025.
b22 The Jevons Paradox in AI Infrastructure: DeepSeek Efficiency Breakthroughs to Drive Energy Demand AI Proem (Substack) 2025 Applies Jevons Paradox to argue that DeepSeek-style efficiency gains will expand total AI compute demand and energy consumption rather than reduce them, establishing a rising infrastructure cost floor that will eventually pressure token prices upward.
b23 The Jevons Paradox in AI: Why Efficiency Creates More Demand The Substrate (Substack) 2025 Documents that per-token prices fell a thousandfold in three years yet total enterprise AI spending surged 320% in 2025, with enterprise inference spend reaching $37B — empirically confirming that Jevons effects dominate price reductions at the market level.
b24 AI agents are about to get more expensive Tiny Empires (Substack) 2025 Argues that multi-step agentic workflows break the 'cheap token' assumption by multiplying token consumption multiplicatively, making true total cost of ownership for agentic AI materially higher than per-token sticker prices suggest.
b25 AI Pricing Architecture Is Now Strategy SaaS Intelligence (Substack) 2025 Frames token-based API pricing as a strategic weapon for developer lock-in and market share capture, drawing direct parallels to early AWS subsidised cloud pricing as a land-grab before margin normalisation.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.