Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

financial
frontier
academic
vc
blogs

Synthesised 2026-04-19

Narrative

The independent blog and newsletter ecosystem tells three tightly interlocking stories about AI token pricing. The first is subsidisation at scale. ScaleDown (tinyml.substack.com), the most analytically rigorous Substack on inference economics, concluded in 2024 via bottom-up hardware modelling that LLM API providers are absorbing over 90% of the true cost of each token, describing the market as a VC-funded 'land-grab phase' structurally identical to Uber's early price subsidisation. SemiAnalysis (Dylan Patel et al.) provided the forensic hardware backing — from the landmark 2023 'Inference Cost of Search Disruption' piece through the January 2025 DeepSeek Debates — consistently showing that GPU die costs, memory bandwidth constraints, power draw, cooling, networking, and staff combine to make current API prices economically unviable at face value. The GitHub Copilot parallel ($10/month subscription against $30/month compute cost) is the canonical example cited across Substack, Medium, and LessWrong. The second story is LLMflation. Andreessen Horowitz coined the term in November 2024 and quantified a 10x annual cost decline for equivalent-performance inference over three years; the November 2025 arXiv paper 'The Price of Progress' gave this academic weight, documenting a 280x cost drop for GPT-3.5-equivalent performance between late 2022 and October 2024. Epoch AI refined the picture, showing price declines of 9x to 900x per year depending on capability tier — with frontier reasoning models holding price while commodity models collapsed — and projecting training costs crossing $1 billion per run by 2027. The third story, which has dominated 2025–2026 commentary, is the Jevons Paradox: despite token prices falling a thousandfold in three years, enterprise AI inference spend reached $37 billion in 2025, up 320% year-on-year. Multiple Substack writers (AI Proem, The Substrate) documented Meta raising 2025 AI capex by 50% after DeepSeek's efficiency announcement rather than cutting it. Ben Thompson at Stratechery frames this as 'the return of marginal costs' — arguing that AI has broken the near-zero marginal cost model underpinning two decades of tech valuations — while Tiny Empires and Aspiring for Intelligence extend the argument to agentic workflows, showing that multi-step chains multiply token consumption to the point where total cost of ownership bears little relationship to headline API rates. Across all three stories, the consensus is that current prices are strategically rather than economically determined, and that energy, data centre capex ($600B+ forecast for 2026), and hardware costs are building a structural floor that will eventually force a reckoning.

Sources

ID	Title	Outlet	Date	Significance
b1	The Unsustainable Economics of LLM APIs: Understanding the Coming Price Realignment	ScaleDown (tinyml.substack.com, Substack)	2024	Bottom-up hardware cost analysis concluding that LLM API providers absorb over 90% of true token costs, framing the current market as a VC-funded 'land-grab phase' structurally analogous to Uber's early subsidised pricing.
b2	The Cost of Inference: Running the Models	ScaleDown (tinyml.substack.com, Substack)	2024	Practitioner-level breakdown of GPU, energy, networking, cooling, and ops overhead that compose the true cost of serving a token, providing the most granular independent infrastructure accounting framework available publicly.
b3	Tokenomics 101: Navigating the Nuances of LLM Product Pricing	ScaleDown (tinyml.substack.com, Substack)	2024	Explains why input/output token price ratios reflect compute and memory bandwidth constraints rather than usage patterns, and quantifies how published API rates relate to underlying unit economics.
b4	The Economics of Building ML Products in the LLM Era	ScaleDown (tinyml.substack.com, Substack)	2024	Examines the total cost of ownership for product builders layering on top of frontier APIs, showing how token costs compound through retrieval, context, and agentic chains to produce effective per-query costs far above headline rates.
b5	The Price of Tokenmaxxing	Aspiring for Intelligence (Substack)	2025	Analyses how Anthropic's API pricing at scale challenges the foundational startup-layer assumption that foundation model costs would remain negligible, arguing the 'cheap token' era is ending for heavy agentic workloads.
b6	The Price Is Wrong	Aspiring for Intelligence (Substack)	2025	Investigates the structural gap between flat-subscription pricing and per-token API rates, arguing Anthropic was cross-subsidising heavy agentic users by more than 5x, a dynamic now forcing explicit pricing architecture decisions.
b7	Groq Inference Tokenomics: Speed, But At What Cost?	SemiAnalysis (newsletter.semianalysis.com, Substack)	2024-02	First-principles cost modelling of Groq's LPU architecture against H100 economics, establishing the benchmark methodology for comparing true cost-per-token across inference hardware generations.
b8	Inference Race To The Bottom - Make It Up On Volume?	SemiAnalysis (newsletter.semianalysis.com, Substack)	2024	Directly addresses whether commodity token prices can persist below true cost at scale, arguing aggressive price competition is structurally unsustainable without volume offsets that current demand does not yet guarantee.
b9	The Inference Cost Of Search Disruption – Large Language Model Cost Analysis	SemiAnalysis (newsletter.semianalysis.com, Substack)	2023	Landmark early analysis estimating what deploying GPT-4-class inference at Google Search scale would cost, establishing a cost-floor analysis that anchored subsequent independent discussion of the scale of lab subsidisation.
b10	DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-01	Forensic reconstruction of DeepSeek's true training compute costs and the implications for Western lab margins, directly testing how much of the apparent cost advantage is real versus accounting artefact.
b11	AMD vs NVIDIA Inference Benchmark: Who Wins? - Performance & Cost Per Million Tokens	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-05	Six-month empirical benchmark comparing hardware cost-per-token across real workloads, revealing a 15x cost reduction from Hopper to Blackwell generation and nuanced workload-specific GPU advantage patterns.
b12	InferenceMAX™: Open Source Inference Benchmarking	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-10	Introduces an independent TCO-per-million-token benchmark — the first to measure total cost of compute across diverse model sizes and real-world scenarios — establishing a replicable methodology for ongoing cost trajectory analysis.
b13	Mythos, Muse, and the Opportunity Cost of Compute	Stratechery (Ben Thompson)	2026	Argues that AI has re-introduced meaningful marginal costs into tech after two decades of near-zero marginal cost software, with direct implications for why token prices have a structural floor and why current pricing is strategically rather than economically motivated.
b14	AI Promise and Chip Precariousness	Stratechery (Ben Thompson)	2025	Examines how DeepSeek and open-weight models create persistent structural pricing pressure, arguing that sustainable margins require either a hardware cost advantage (Google's TPU edge) or aggregation, not just capability differentiation.
b15	Rapidus, The End of Economic Rationality, AI Disruption	Stratechery (Ben Thompson)	2024	Argues that AI capex commitments have entered a regime where strategic imperatives suspend economic rationality, contextualising why labs sustain large operating losses to hold developer market position.
b16	Observations About LLM Inference Pricing	LessWrong	2024	Empirical analysis showing 10x price dispersion for identical open-weight models across providers, inferring that software stack optimisation (batching, kernel efficiency, speculative decoding) drives more of the actual cost variance than hardware alone.
b17	Simon Willison on llm-pricing (tag archive)	Simon Willison's Weblog	2023	Running empirical record of every major LLM pricing event from GPT-4's launch through 2026, with practitioner cost benchmarks (e.g. captioning 68,000 images for $1.68 with Gemini Flash) documenting the ~150x price drop with concrete real-world examples.
b18	Welcome to LLMflation — LLM inference cost is going down fast	Andreessen Horowitz (a16z)	2024-11	Coins 'LLMflation' and quantifies a 10x annual cost decline for equivalent-performance inference over three years — from $60/M tokens in 2021 to $0.06/M by late 2024 — the most-cited single data point in independent discourse on the price-collapse rate.
b19	How persistent is the inference cost burden?	Epoch AI (Substack)	2025	Analyses whether inference costs as a share of lab revenues are structural or transitional, estimating OpenAI's 2024 inference compute spend and modelling future cost burden under different algorithmic efficiency trajectories.
b20	How much does it cost to train frontier AI models?	Epoch AI	2024	Quantifies that frontier model training costs are growing 2–3x per year and projects the largest runs crossing $1 billion by 2027, directly addressing how training capex amortises into per-token inference pricing and why apparent API prices understate true costs.
b21	LLM inference prices have fallen rapidly but unequally across tasks	Epoch AI	2025	Demonstrates that inference price decline rates range from 9x to 900x per year depending on capability tier, with frontier reasoning models holding price stable while commodity models collapsed — the key bifurcation story of 2024–2025.
b22	The Jevons Paradox in AI Infrastructure: DeepSeek Efficiency Breakthroughs to Drive Energy Demand	AI Proem (Substack)	2025	Applies Jevons Paradox to argue that DeepSeek-style efficiency gains will expand total AI compute demand and energy consumption rather than reduce them, establishing a rising infrastructure cost floor that will eventually pressure token prices upward.
b23	The Jevons Paradox in AI: Why Efficiency Creates More Demand	The Substrate (Substack)	2025	Documents that per-token prices fell a thousandfold in three years yet total enterprise AI spending surged 320% in 2025, with enterprise inference spend reaching $37B — empirically confirming that Jevons effects dominate price reductions at the market level.
b24	AI agents are about to get more expensive	Tiny Empires (Substack)	2025	Argues that multi-step agentic workflows break the 'cheap token' assumption by multiplying token consumption multiplicatively, making true total cost of ownership for agentic AI materially higher than per-token sticker prices suggest.
b25	AI Pricing Architecture Is Now Strategy	SaaS Intelligence (Substack)	2025	Frames token-based API pricing as a strategic weapon for developer lock-in and market share capture, drawing direct parallels to early AWS subsidised cloud pricing as a land-grab before margin normalisation.