Research · Summary

Research sweep · deep · 2023 – 2026

Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

financial
frontier
academic
vc
blogs

Synthesised 2026-04-19

AI Token Pricing vs. True Total Cost of Ownership: A Research Synthesis (2023–2026)

Overview

The price users pay per AI token has collapsed by 85–95% since GPT-4 launched in March 2023. Frontier input tokens that cost $30–36 per million at launch traded at $1.75–5 per million by early 2026, while commodity models fell below $0.10 per million. This decline represents the fastest sustained price drop in the history of enterprise technology, outpacing even the aggressive cloud pricing wars of the late 2000s. Yet beneath this headline deflation sits a structural reality that financial markets, enterprise buyers, and policy analysts are only beginning to confront: published API prices bear little relationship to what it actually costs to serve a token, and the gap is being funded by investor capital rather than operating economics.

The defining shift of the past 18 months has been the recognition that frontier AI inference is a deliberately subsidized market. OpenAI's leaked financial documents revealed $143 billion in projected cumulative cash outflow from 2024 to 2029, with inference costs at $8.4 billion in 2025 rising to $14.1 billion in 2026 against gross margins held below 35%. Anthropic cut Claude Opus pricing by 67% in a single 2025 announcement. DeepSeek's January 2025 emergence demonstrated frontier-equivalent performance at 5–10% of incumbent cost structures, forcing repricing across the industry. These are not efficiency gains passed through to consumers; they are strategic losses absorbed by balance sheets backed by over $650 billion in combined Big Tech AI capex for 2026 alone.

Sources: Fortune (2025) (↗); Bloomberg (2026) (↗); PYMNTS (2025) (↗); Bloomberg (2025) (↗)

The analytical challenge for 2026 is that two opposing forces are accelerating simultaneously. Hardware efficiency continues to improve at rates exceeding Moore's Law, with NVIDIA's Blackwell architecture delivering up to 15× inference performance improvement over H100. Algorithmic optimizations (quantization, speculative decoding, mixture-of-experts architectures) compound these gains. Yet infrastructure costs are rising, not falling: data center construction timelines remain at two years, power grid constraints are binding in key regions, and energy demand from AI queries runs 10× that of traditional search. The question is no longer whether token prices will eventually normalize toward cost, but when, and at what level the floor will settle.

Sources: NVIDIA (official) (2025) (↗); Bloomberg Intelligence (2025) (↗)

Key Findings

1. Subsidization is quantifiable and material, not speculative.

OpenAI spent an estimated $8.4 billion on inference in 2025, with projections rising to $14.1 billion in 2026. Total cumulative losses through 2024 reached $5 billion. Centific documented a 25× gap between flat subscription pricing and actual API cost for heavy users. Independent hardware modeling by ScaleDown concluded that providers absorb over 90% of true token cost. The GitHub Copilot case, with $10/month subscriptions against $30/month compute costs, has become the canonical illustration across analyst coverage. This is not a disputed empirical question: the subsidies are real, documented, and funded by equity financing rather than cross-subsidization from profitable operations.

Sources: Fortune (2025) (↗); Centific (2025) (↗); ScaleDown (tinyml.substack.com, Substack) (2024) (↗)

2. The AWS parallel is explicit but imperfect.

Frontier labs are replicating Amazon's early cloud strategy: price below cost to capture developer workflow, create switching costs through API integration and fine-tuned dependencies, and normalize pricing once market position is unassailable. AWS sustained this approach from 2006 to roughly 2012–2014. The critical difference is capital intensity: AWS early subsidies measured in tens of millions of dollars; OpenAI's 2024–2025 inference subsidies measure in billions. AWS also achieved positive unit economics for large customers even in early years, meaning its subsidy was strategic rather than structural. Frontier AI inference appears genuinely loss-making at scale, with profitability targeted for 2030 by OpenAI's own projections.

Sources: Fortune (2025) (↗); Stratechery (Ben Thompson) (2024) (↗)

3. Enterprise TCO diverges dramatically from API sticker price.

McKinsey and Gartner consistently document that API costs represent only 15–30% of enterprise deployment TCO for complex applications. The remainder comprises data preparation, prompt engineering, fine-tuning, security and compliance auditing, integration and orchestration overhead, human-in-the-loop validation, and RAG infrastructure. The a16z enterprise CIO survey found 84% of respondents reporting AI-related margin erosion despite falling token prices. Financial analysts modeling AI ROI solely on declining token prices systematically underestimate deployment costs.

Sources: Gartner (2026) (↗); Andreessen Horowitz (a16z) (2025) (↗)

4. Energy is emerging as the binding cost floor.

Bloomberg's April 2026 reporting on OpenAI pausing its UK Stargate data center explicitly named energy cost as the constraint. Academic research projects AI data center electricity demand reaching 9–12% of total US energy by 2030. An April 2026 arXiv paper documented that geographic clustering of AI data centers creates nonlinear regional grid stress and capacity market price spikes of 10× or more. Unlike compute costs, which follow steep improvement curves, energy and cooling costs are rising in absolute terms as data center density increases to 100+ kW per rack with projections above 1 MW. This creates an asymmetric cost structure that algorithmic efficiency cannot fully offset.

Sources: Bloomberg (2026) (↗); Bloomberg Intelligence (2025) (↗)

5. Training cost amortization remains underweighted in pricing analysis.

Epoch AI established that frontier training costs grow approximately 2.4× per year, with the largest runs projected to exceed $1 billion by 2027. GPT-4-era training was estimated at $50–100 million; frontier 2025–2026 runs cost $500 million to $1 billion or more. This sunk cost must be amortized over an 18–24 month useful life before the next generation supersedes it. At lower-than-projected inference volumes, the drag on margins is significant, and it does not shrink proportionally with per-run efficiency gains as training scale continues to escalate.

Sources: arXiv (preprint) (2024) (↗); Epoch AI (2024) (↗)

6. Jevons paradox is the central analytical challenge for price trajectory projections.

Cheaper tokens produce proportionally more demand, absorbing efficiency gains and sustaining energy cost pressure. Epoch AI data shows inference revenue at major labs growing 3× per year even as per-token prices decline, because volume more than compensates. The a16z/OpenRouter study covering 100 trillion real tokens found average coding prompts exceeding 20,000 tokens, with agentic inference as the fastest-growing workload. Meta raised 2025 AI capex by 50% after DeepSeek's efficiency announcement rather than cutting it. Enterprise AI inference spend reached $37 billion in 2025, up 320% year-on-year, despite thousandfold price declines.

Sources: Andreessen Horowitz (a16z) (2026) (↗); AI Proem (Substack) (2025) (↗); The Substrate (Substack) (2025) (↗)

7. Price decline rates vary dramatically by capability tier.

Epoch AI's empirical tracking shows inference prices for commodity performance tiers falling 9× to 900× per year, with the widest variance by benchmark type. Frontier reasoning models have not followed commodity curves; their pricing has remained substantially above the collapsing commodity baseline. This creates a two-tier market: cheap commodity inference for routine tasks and premium frontier-model access for the most capable systems. The spread between tiers has widened, not narrowed, since 2024.

Sources: Epoch AI (2025) (↗); Epoch AI (2025) (↗)

8. Capability growth compounds the cost-per-useful-work equation.

METR's research documents that AI time horizons (the task-completion duration at which models achieve 50% success) doubled approximately every 7 months through early 2025. Combined with per-token cost reductions, the implied cost-per-unit-of-human-equivalent-work is falling faster than any prior productivity technology. This has a paradoxical effect: agentic deployments become more economically attractive, driving aggregate token demand upward through Jevons dynamics while individual tasks become cheaper.

Sources: METR (Model Evaluation & Threat Research) (2025) (↗); METR (Model Evaluation & Threat Research) (2026) (↗)

9. The capital stack required to sustain subsidization is historically unprecedented.

Combined Big Tech AI capex reached $650 billion for 2026 alone. Bloomberg characterized the associated debt financing as a $3 trillion market event. Bain calculates that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand, alongside an $800 billion infrastructure shortfall. Goldman Sachs' 2024 report questioning whether $1 trillion in AI investment would generate adequate returns remains the reference document for financial market skepticism.

Sources: Bloomberg (2026) (↗); Bloomberg (2026) (↗); Bain & Company (2025) (↗); Goldman Sachs (2024) (↗)

Evidence & Data

The most precise cost structure data comes from bottom-up hardware analysis. SemiAnalysis established that H100 inference at typical utilization produces a cost floor of roughly $0.50–2.00 per million tokens depending on model size, well above the $0.07–0.30 prices that commodity models fetched by late 2024. NVIDIA's Blackwell architecture results show hardware efficiency gains of up to 15× versus H100. Google's TPU v6e (Trillium) delivered 4× improvement in inference performance per dollar. ARK Invest pegs overall inference cost declines at approximately 95% annually using SemiAnalysis benchmark data.

Sources: SemiAnalysis (newsletter.semianalysis.com, Substack) (2023) (↗); NVIDIA (official) (2025) (↗); Google Cloud (official) (2024) (↗); ARK Investment Management (2025) (↗)

On the price trajectory, the historical record is unambiguous: GPT-4 launched at $30 per million input tokens in March 2023; GPT-4 Turbo dropped to $10 per million in November 2023; GPT-4o reached $5 per million in May 2024; commodity models fell under $0.10 per million by late 2024; DeepSeek V3 priced at $0.14 per million in December 2024. By early 2026, flagship models clustered around $1.75–5 per million input tokens. Epoch AI's formal analysis found cost reductions averaging 10× per year for a given performance level, with benchmark-specific variance ranging from 9× to 900× per year.

Sources: Simon Willison's Weblog (2023) (↗); Andreessen Horowitz (a16z) (2024) (↗); Epoch AI (2025) (↗)

Infrastructure spend figures anchor the capital intensity argument. McKinsey projects $3.7–7.9 trillion in global data center capex through 2030 (base case $5.2 trillion). Bloomberg Intelligence forecasts Big Tech 2025 capex may hit $200 billion as generative AI demand booms. The accelerator market is projected to exceed $600 billion by 2033. Gartner forecasts $2.52 trillion in total AI spending for 2026, up 44% year-on-year.

Sources: McKinsey Global Institute (2025) (↗); Bloomberg Intelligence (2025) (↗); Bloomberg Intelligence (2025) (↗); Gartner (2026) (↗)

Energy cost data provides the clearest signal of structural cost floors. Academic research documents that inference accounts for over 90% of LLM lifecycle energy consumption, inverting the common framing that training dominates. TokenPowerBench established the first dedicated methodology for measuring joules per token across batch sizes, context lengths, and quantization levels. US DOE projects data center electricity consumption tripling by 2028. AI data centers are projected to consume 9–12% of total US electricity by 2030.

Sources: Bloomberg Intelligence (2025) (↗)

The on-premise breakeven analysis offers the most direct bound on subsidization magnitude. Academic research finds on-premise inference can break even against commercial API pricing in 0.3–3 months for moderate workloads, empirically demonstrating how far below true serving cost commercial API prices are set.

Signals & Tensions

The commodity-frontier price spread is widening, not converging. While commodity models have seen price collapse to sub-$0.10 per million tokens, frontier reasoning models have maintained substantially higher pricing. This suggests a durable two-tier market structure rather than uniform commoditization. Labs appear to be using commodity pricing for developer acquisition while preserving margins on frontier capabilities. The tension is whether open-weight models (DeepSeek, Llama) will eventually compress frontier pricing, or whether capability gaps will sustain premium tiers indefinitely.

Energy cost projections remain poorly integrated into pricing models. Most analyst coverage focuses on GPU and hardware cost trajectories while underweighting energy as a structural input. The April 2026 finding that regional grid concentration creates nonlinear capacity market price spikes is not reflected in standard per-token pricing frameworks. Labs securing long-term power purchase agreements in low-cost grid regions (nuclear-backed Midwest, hydro-backed Pacific Northwest) may gain structural advantages invisible in hardware-only analyses.

Sources: Bloomberg (2026) (↗)

The consensus on subsidization duration is weakening. Early analyst framing (2023–2024) assumed pricing would normalize within 2–3 years as the AWS playbook predicted. OpenAI's financial disclosures targeting profitability in 2030, combined with $143 billion in projected cash outflow through 2029, suggest the window has extended. Sequoia's 2024 warning of a $600 billion revenue shortfall has not resolved; the gap has widened. The question of investor patience is now explicit in financial press coverage.

Sources: Sequoia Capital (2024) (↗); Fortune (2025) (↗)

Regulatory cost overhead is underreported. EU AI Act compliance, NIST AI RMF adoption, and emerging financial sector governance requirements add materially to enterprise TCO. These costs do not decline with token price and may grow through 2028. The compliance and audit burden is largely absent from VC and analyst projections focused on hardware economics.

Open-weight operational variance undermines simple cost-floor arguments. LessWrong documentation shows open-weight model prices varying 10× across providers for identical weights. This implies operational excellence (batching, kernel optimization, memory management) is the primary near-term differentiator, complicating hardware-centric cost-floor projections.

Sources: LessWrong (2024) (↗)

Open Questions

When does marginal inference become cash-flow positive at frontier labs? OpenAI targets 2030. Anthropic has not disclosed. The timeline depends on whether training cost growth (2.4× per year) outpaces inference efficiency gains (10× per year for equivalent performance), and whether revenue scales faster than infrastructure commitments.

What is the true energy cost floor for frontier inference? Academic research establishes that inference dominates lifecycle energy, but the conversion to dollars-per-token varies by geography, power contract structure, and cooling technology. No public framework integrates regional grid economics with per-token pricing.

Will agentic workloads absorb efficiency gains entirely? The Jevons paradox literature documents historical rebounds, but the elasticity of AI token demand remains empirically uncertain. If multi-agent loops and extended reasoning chains scale proportionally with cost reduction, total inference spending may never decline even as unit costs fall.

How durable is the frontier-commodity price spread? Open-weight competition (DeepSeek, Qwen, Llama) has compressed commodity pricing but has not yet reached frontier reasoning capabilities. Whether this spread represents a durable structural feature or a temporary gap depends on training cost dynamics and whether open-weight labs can sustain frontier-scale investment.

What happens when VC patience exhausts? The historical AWS parallel suggests markets tolerate 6–8 years of subsidization before demanding margin normalization. AI labs are now in year 3–4 of material inference subsidization. The trigger conditions for pricing normalization remain undefined.

How will geographic grid constraints reshape competitive positioning? April 2026 research shows AI data center clustering creates regional capacity market stress. Labs with early access to low-cost, high-capacity grid regions may gain cost advantages that dwarf hardware efficiency differences. The distribution of these advantages is not well mapped.

What regulatory cost burden will materialize by 2028? AI Act, sectoral governance requirements, and audit obligations are accumulating, but quantified TCO impact estimates do not exist. Enterprise buyers are pricing this risk as uncertainty rather than cost, potentially underestimating deployment economics.

![[sources-ai-token-pricing-vs-true-total-cost-of-ownership-f]]

Sources

Summary: ↑ Back to summary

Financial Press

ID	Title	Outlet	Date	Significance
f1	How Much Is Big Tech Spending on AI Computing? A Staggering $650 Billion in 2026	Bloomberg	2026-02	Definitive Bloomberg News quantification of 2026 hyperscaler AI capex at $650B, establishing the scale of infrastructure investment that underpins current token pricing subsidies.
f2	The $3 Trillion AI Data Center Build-Out Becomes All-Consuming For Debt Markets	Bloomberg	2026-02	Bloomberg's deep-dive into debt market financing of AI infrastructure, revealing the financial mechanics behind how data-centre construction costs are being funded and how that cost ultimately flows through to inference economics.
f3	OpenAI Pauses Stargate UK Data Center Citing Energy Costs	Bloomberg	2026-04	Illustrates that energy cost constraints are already forcing project-level decisions at the frontier lab level, confirming that power is emerging as a binding cost floor for token pricing.
f4	AI Spending Boom Shifts From Training Models to Running Them	Bloomberg	2025-04	Pivotal Bloomberg newsletter piece documenting the structural shift in AI capex from model training to inference workloads, the key transition defining the 2025–2026 cost and pricing landscape.
f5	Why AI Bubble Concerns Loom as OpenAI, Microsoft, Meta Ramp Up Spending	Bloomberg	2025-11	Bloomberg synthesises mounting analyst concern that AI infrastructure investment is outpacing monetisation, directly relevant to whether current token prices can ever cover true costs.
f6	OpenAI Says Spending to Rise to $115B Through 2029	Bloomberg	2025-09	Bloomberg reporting on OpenAI's internal spending roadmap, confirming that compute cost trajectories are projected to rise sharply even as token prices are cut, widening the subsidy gap.
f7	Watch AI Cost Assumptions Challenged	Bloomberg	2025-01	Bloomberg live coverage on the day of DeepSeek's market shock, capturing real-time financial market reaction to a rival model achieving near-frontier performance at a fraction of the cost, directly challenging incumbent pricing assumptions.
f8	OpenAI CFO Thinks Business Users Will Pay Thousands For AI Software	Bloomberg	2024-12	Direct executive commentary from OpenAI's CFO on the enterprise pricing strategy, revealing the planned shift toward high-ARPU subscription models as an alternative to per-token revenue to fund infrastructure.
f9	Microsoft Sets Expensive Price Tag for New Corporate AI Products	Bloomberg	2023-07	Early Bloomberg benchmarking of enterprise AI product pricing (Microsoft Copilot at $30/user/month), providing a 2023 baseline to measure how enterprise AI pricing models have evolved.
f10	AI Inferencing at Crossroads	Bloomberg Intelligence	2025	Bloomberg Intelligence analysis of inference as the critical commercial battleground, detailing how model distillation and quantisation are reducing per-token costs while demand scaling offsets margin improvements.
f11	Big Tech 2025 Capex May Hit $200 Billion as Gen-AI Demand Booms	Bloomberg Intelligence	2025	Bloomberg Intelligence capex projection establishing that 2025 hyperscaler infrastructure spend — the cost base that subsidises token pricing — would reach $200B, up sharply from prior years.
f12	AI Accelerator Market Looks Set to Exceed $600 Billion by 2033	Bloomberg Intelligence	2025	Bloomberg Intelligence market-sizing of the AI accelerator chip ecosystem ($116B in 2024 to $604B by 2033), quantifying the hardware cost trajectory underlying all token pricing models.
f13	AI Is a Game Changer for Power Demand	Bloomberg Intelligence	2025	Bloomberg Intelligence analysis of how AI data centres are transforming energy markets, with generative AI queries consuming up to 10x the energy of traditional searches, establishing energy as a rising structural cost component.
f14	Gen AI: Too Much Spend, Too Little Benefit?	Goldman Sachs	2024-06	Goldman Sachs' most-cited AI sceptic report, with head of global equity research questioning whether $1T in AI infrastructure can generate adequate returns; a key reference point for the 'cost vs. benefit' debate in financial markets.
f15	Will the $1 Trillion of Generative AI Investment Pay Off?	Goldman Sachs	2024	Goldman Sachs investment research framing the core financial question around AI infrastructure: whether the capital cycle is commercially justifiable, directly informing how analysts assess the sustainability of below-cost token pricing.
f16	Why AI Companies May Invest More Than $500 Billion in 2026	Goldman Sachs	2026	Goldman Sachs' most current projection on AI infrastructure spending, providing a 2026 financial-market perspective on whether investment momentum is sustainable and what it implies for token cost floors.
f17	The Cost of Compute: A $7 Trillion Race to Scale Data Centers	McKinsey & Company	2025	McKinsey's comprehensive bottom-up analysis of data centre cost structure, projecting $5.2T required investment through 2030, and decomposing the build cost into land, power, cooling, and compute components.
f18	The New Economics of Enterprise Technology in an AI World	McKinsey & Company	2025	McKinsey's enterprise-facing analysis of how AI shifts IT spending from capex to opex, with FinOps and token-level cost visibility emerging as critical for managing true AI deployment TCO beyond API sticker prices.
f19	LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks	Epoch AI	2025	The most rigorous empirical tracking of token price declines across performance tiers, documenting 9x–900x annual price drops depending on task and showing that frontier reasoning models have not followed commodity price trends.
f20	Inference Economics of Language Models	Epoch AI	2024	Epoch AI's foundational decomposition of what drives LLM inference costs — hardware utilisation, model size, batch size, memory bandwidth — providing the analytical framework cited by financial analysts evaluating token pricing sustainability.
f21	AI Datacenter Energy Dilemma — Race for AI Datacenter Space	SemiAnalysis	2024	SemiAnalysis's deep technical analysis of data-centre power constraints as a structural cost floor for AI inference, widely cited in financial press as the authoritative bottom-up view on infrastructure economics.
f22	Groq Inference Tokenomics: Speed, But At What Cost?	SemiAnalysis	2024	SemiAnalysis cost-per-token breakdown for specialised inference hardware, quantifying real economics of serving tokens and demonstrating the gap between cloud-provider pricing and actual hardware cost.
f23	OpenAI Says It Plans to Report Stunning Annual Losses Through 2028 — and Then Turn Wildly Profitable Just Two Years Later	Fortune	2025-11	Fortune's reporting on leaked OpenAI financial projections showing $44B cumulative losses before 2029 profitability — the definitive document source for quantifying how much investor capital is subsidising current token prices.
f24	Perspective: AI Demand Is Inflated, and Only Anthropic Is Being Realistic	CNBC	2026-04	Most recent (April 2026) financial media critique of AI demand assumptions and token consumption projections, with direct commentary on Anthropic's more conservative pricing and demand forecasting relative to OpenAI and Nvidia.
f25	AI Training Costs Are Improving at 50x the Speed of Moore's Law	ARK Invest	2023	ARK Invest's Wright's Law application to AI compute, projecting that AI training and inference costs decline at 50x the pace of Moore's Law — the bullish analytical counterpoint to Goldman Sachs' scepticism on AI cost trajectories.

Frontier Lab & Model News

ID	Title	Outlet	Date	Significance
t1	METR's GPT-4.5 Pre-Deployment Evaluations	METR (Model Evaluation & Threat Research)	2025-02	Official METR pre-deployment autonomy evaluation of GPT-4.5, finding capabilities between GPT-4o and o1 and assessing risk level relative to existing frontier models.
t2	Details about METR's Preliminary Evaluation of Claude 3.7	METR (Model Evaluation & Threat Research)	2025-04	Pre-deployment autonomy assessment of Claude 3.7 Sonnet, noting impressive AI R&D capabilities on RE-Bench but no evidence of dangerous-level autonomous capabilities.
t3	Details about METR's Evaluation of OpenAI GPT-5	METR (Model Evaluation & Threat Research)	2025-05	METR's autonomy evaluation of OpenAI's flagship GPT-5 model, providing the most current public capability benchmarking for the frontier's leading model.
t4	Details about METR's Preliminary Evaluation of GPT-4o	METR (Model Evaluation & Threat Research)	2024-05	Baseline METR autonomy evaluation for GPT-4o, establishing a reference point against which later models' capability escalations are measured.
t5	Task-Completion Time Horizons of Frontier AI Models — Time Horizon 1.1	METR (Model Evaluation & Threat Research)	2026-01	METR's updated time-horizon dataset showing frontier model autonomous task-completion window doubling roughly every 7 months since 2019, with an expanded task suite giving tighter estimates at longer horizons.
t6	Measuring AI Ability to Complete Long Tasks	METR (Model Evaluation & Threat Research)	2025-03	Introduces METR's methodology for quantifying how long AI agents can sustain productive autonomous work, directly informing the inference-cost implications of extended agentic deployments.
t7	Details about METR's Preliminary Evaluation of DeepSeek and Qwen Models	METR (Model Evaluation & Threat Research)	2025-07	Finds mid-2025 DeepSeek autonomous capability levels comparable to late-2024 frontier models, highlighting how cost-efficient open-weight models are closing the autonomy gap.
t8	Introducing Claude 3.5 Sonnet	Anthropic	2024-06	Official launch announcement establishing Claude 3.5 Sonnet as Anthropic's price-performance flagship, priced at $3/$15 per million tokens — significantly undercutting Claude 3 Opus at $15/$75.
t9	Model Card Addendum: Claude 3.5 Haiku and Upgraded Claude 3.5 Sonnet	Anthropic	2024-10	Official Anthropic model card documenting safety evaluations, capability benchmarks, and technical specifications for the October 2024 Claude 3.5 refresh — a primary technical disclosure.
t10	Google and Anthropic Drop AI Prices and Release New Models	PYMNTS	2025-05	Documents the coordinated 2025 pricing cuts by Google (Gemini) and Anthropic (Claude Opus 4.5, price cut by 67%), illustrating competitive subsidisation dynamics between frontier labs.
t11	OpenAI Has Spent $12B on Inference with Microsoft: Report	The Register	2025-11	Reports OpenAI's cumulative inference spend of $12B on Azure, exposing the massive infrastructure subsidy underpinning user-facing token prices.
t12	OpenAI Training and Inference Costs Could Reach $7bn for 2024, AI Startup Set to Lose $5bn	Data Center Dynamics	2024-09	Key financial disclosure showing OpenAI's 2024 compute cost structure — $7B in training and inference against $3.7B revenue — quantifying the scale of below-cost token pricing.
t13	Exclusive: Here's How Much OpenAI Spends on Inference and Its Revenue Share With Microsoft	Where's Your Ed At (Ed Zitron)	2025-05	Detailed breakdown of OpenAI's leaked internal financials, showing inference costs at $8.4B in 2025 — 66% from paying users — with projections rising to $14.1B in 2026.
t14	OpenAI Faces Financial Growing Pains, Spending Double Its Revenue	DeepLearning.AI – The Batch	2024-10	Concise summary of OpenAI's loss trajectory ($540M in 2022 → $5B in 2024), contextualising why user-facing token prices remain far below true cost.
t15	The Rising Costs of Training Frontier AI Models	arXiv (preprint)	2024-05	Academic analysis quantifying the exponential escalation in frontier model training costs, providing the cost-amortisation context for why labs price tokens below marginal cost.
t16	AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design	arXiv (preprint)	2026-03	Proposes a formal framework for AI token pricing as a tradeable commodity, analysing the structural forces — lab subsidisation, demand elasticity, and market power — driving current API pricing.
t17	Photons = Tokens: The Physics of AI and the Economics of Knowledge	arXiv (preprint)	2026-03	Formalises the Structural Jevons Paradox in AI: as unit token costs fall, firms redesign agent architectures to consume dramatically more compute via deeper reasoning loops and larger context windows.
t18	InferenceMAX™: Open Source Inference Benchmarking	SemiAnalysis	2025-06	SemiAnalysis's open-source benchmark showing NVIDIA Blackwell delivering 15× lower cost per million tokens versus prior generation, setting the hardware efficiency baseline for 2025–2026 pricing floors.
t19	AI Datacenter Energy Dilemma — Race for AI Datacenter Space	SemiAnalysis	2024-12	Detailed infrastructure analysis from SemiAnalysis on power constraints, data-centre construction timelines, and energy costs as the principal rising-cost vector offsetting hardware efficiency gains.
t20	Google TPUv7: The 900lb Gorilla In the Room	SemiAnalysis	2025-08	Deep technical analysis of Google's latest proprietary TPU, showing how vertical compute integration gives Google a structural cost advantage in Gemini inference pricing versus GPU-dependent rivals.
t21	Introducing Cloud TPU v5p and AI Hypercomputer	Google Cloud (official)	2023-12	Google's official announcement of TPU v5p infrastructure powering Gemini training, establishing the proprietary compute stack that underpins Google's inference cost economics.
t22	Trillium TPU Is GA	Google Cloud (official)	2024-11	Announces general availability of Trillium (TPU v6e), offering 4× better performance-per-dollar for inference versus v5e and used to train Gemini 2.0 — quantifying Google's hardware efficiency edge.
t23	NVIDIA Blackwell Raises Bar in New InferenceMAX Benchmarks, Delivering Unmatched Performance and Lowest Cost Per Token	NVIDIA (official)	2025-07	Official NVIDIA benchmark results showing Blackwell architecture's cost-per-token leadership, directly informing the hardware cost floor for frontier labs running GPU-based inference.
t24	Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters	NVIDIA (official)	2025-03	NVIDIA's TCO framework for 'AI factories,' arguing that total cost of ownership — not GPU price — governs real inference economics, encompassing compute, networking, cooling, and utilisation.
t25	The 25× Subscription Trap: Why Frontier Labs Can No Longer Subsidize Your AI	Centific	2025-09	Documents the 25× gap between flat subscription fees and actual API cost for heavy users, providing concrete evidence of the scale of cross-subsidisation in frontier lab pricing models.

VC & Analyst Reports

ID	Title	Outlet	Date	Significance
v1	State of AI: An Empirical 100 Trillion Token Study with OpenRouter	Andreessen Horowitz (a16z)	2026-01	Empirical study of 100T tokens routed via OpenRouter reveals that agentic inference is the fastest-growing use pattern and that developers overwhelmingly optimise for quality over price, with Claude holding ~60% of coding workloads at 20K+ token average prompts — directly illustrating Jevons paradox at the token level.
v2	AI Is Driving A Shift Towards Outcome-Based Pricing (December 2024 Enterprise Newsletter)	Andreessen Horowitz (a16z)	2024-12	Argues that per-token pricing is giving way to outcome-based pricing as AI costs scale, but finds CIOs remain uncomfortable with outcome metrics — a key signal that token-cost opacity is migrating into enterprise contract design.
v3	How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025	Andreessen Horowitz (a16z)	2025	Survey of 100 enterprise CIOs finds 80% missed AI infrastructure cost forecasts by more than 25% and 84% report margin erosion tied to AI workloads, establishing that user-facing token prices grossly understate true enterprise TCO.
v4	CFO Roundtable: AI Growth, Pricing, and Forecasting (June 2025 Fintech Newsletter)	Andreessen Horowitz (a16z)	2025-06	CFO-level discussion on AI unit economics reveals that every token processed is a direct variable cost, and that the newest reasoning models still command relatively high costs despite commodity-model price compression.
v5	AI's $600B Question	Sequoia Capital	2024-06	David Cahn's landmark framework quantifies the annual revenue gap between AI infrastructure investment and actual AI-ecosystem revenue at ~$600B, calculates GPU costs as exactly half of AI data-centre TCO, and explicitly flags rapid GPU depreciation as a structural risk to lab economics.
v6	AI is Now Shovel Ready	Sequoia Capital	2024-12	Designates 2025 as the 'Year of the Data Center,' detailing that average AI data-centre construction takes ~2 years, that Amazon committed $50B+ to new builds in H1 2024, and that capital allocation risk from long lead times is a primary structural constraint on AI supply economics.
v7	AI in 2025: Building Blocks Firmly in Place	Sequoia Capital	2025-01	Annual outlook positions 2025 as an execution year where infrastructure build-out transitions from deal-signing to physical deployment, with cloud service providers competing on GPU cluster scale and pricing as the primary near-term battleground.
v8	AI in 2026: A Tale of Two AIs	Sequoia Capital	2026-01	Identifies a bifurcation between commoditised inference and frontier reasoning models, framing the divergence as a structural price-floor dynamic where frontier capability commands premium pricing while commodity models race toward near-zero marginal cost.
v9	The Cost of Compute: A $7 Trillion Race to Scale Data Centers	McKinsey Global Institute	2025	Projects $3.7–$7.9 trillion in global data-centre capex through 2030 across three demand scenarios, with the base case at $5.2 trillion, and allocates ~60% of spend to computing hardware, ~25% to power and cooling, and ~15% to construction — the most comprehensive public cost-stack decomposition available.
v10	Who's Funding the AI Data Center Boom?	McKinsey Global Institute	2025	Examines the financing structure behind AI data-centre buildout, clarifying that hyperscaler balance sheets, sovereign wealth funds, and private credit are the three capital pools underwriting infrastructure that token prices must eventually recoup.
v11	Issue Brief: AI Infrastructure	McKinsey Global Institute	2025	Frames AI infrastructure as an 'AI factory' model — data and electricity as inputs, tokens and insights as outputs — directly linking compute capex to revenue generation and articulating the economic logic that will ultimately drive token price normalisation.
v12	Beyond Compute: Infrastructure That Powers and Cools AI Data Centers	McKinsey Global Institute	2025	Analyses the non-compute TCO components (power, cooling, backup generation, physical plant) that are often invisible in quoted token prices, projecting 200 incremental GW of AI-related capacity required in the accelerated scenario and flagging energy as a rising, not falling, cost component.
v13	Token Economics, Physical AI, and Beyond: McKinsey Previews NVIDIA GTC	McKinsey Global Institute	2025-03	McKinsey's Chris Smith explicitly adopts 'token economics' as an analytical unit, signalling that major strategy consultancies have shifted from cloud-hour pricing to per-token unit economics as the primary framework for AI infrastructure ROI analysis.
v14	Technology Report 2025: $2 Trillion in New Revenue Needed to Fund AI's Scaling Trend	Bain & Company	2025	Bain's headline finding that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand is the single most cited cost-recovery gap figure in 2025 analyst literature, and directly implies sustained lab subsidisation until that gap closes.
v15	How Can We Meet AI's Insatiable Demand for Compute Power?	Bain & Company	2025	Quantifies that AI compute demand is growing at more than twice the rate of Moore's Law and projects a global $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud and AI data centres.
v16	AI's Trillion-Dollar Opportunity (Global Technology Report 2024)	Bain & Company	2024	Bain's 2024 baseline report that documents unprecedented GenAI adoption speed despite cost roadblocks, establishing the trajectory against which the 2025 $2T gap estimate is benchmarked.
v17	Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026	Gartner	2026-01	Gartner's official forecast of $2.52 trillion in worldwide AI spending for 2026 — a 44% YoY increase — provides the most widely cited market-size anchor for contextualising token-price economics against total infrastructure outlays.
v18	Strategic Predictions for 2026: How AI's Underestimated Influence Is Reshaping Business	Gartner	2025-10	Gartner's top strategic predictions for 2026 and beyond cover AI agent proliferation, agentic spending intermediation, and enterprise cost displacement — providing the Technology Radar framing for how token-cost trajectory intersects with enterprise software budgets.
v19	Gartner Survey Finds 54% of Infrastructure & Operations Leaders Are Adopting AI to Cut Costs	Gartner	2025-10	Survey evidence that more than half of I&O leaders view AI primarily as a cost-reduction tool, creating a circular dynamic where AI's cost is justified by AI's cost savings — a framing that shapes enterprise willingness to absorb rising token bills.
v20	The State of AI Infrastructure: Demand, Costs, and Custom Silicon	ARK Investment Management	2025-12	Using SemiAnalysis InferenceMax benchmarks, ARK calculates that inference costs for capable models are falling at ~95% annually, outpacing the ~75% annual training cost decline, and identifies custom silicon (Trainium, TPU, MTIA) as the next structural cost lever hyperscalers are deploying to reduce Nvidia dependence.
v21	AI Will Determine the Future of Software and Cloud Spending	ARK Investment Management	2025	ARK projects global data-centre systems investment growing at 30%+ annually to reach $653B in 2026, with AI infrastructure spend tripling to ~$1.5T by 2030, providing the demand-side framework for understanding why token prices cannot fall indefinitely even with hardware efficiency gains.
v22	Can AI Companies Become Profitable?	Epoch AI	2025	Epoch AI's analysis of multiple frontier labs finds compute (R&D plus inference) comprises 54–62% of costs and that spending is currently 2–3× revenue at each lab, with OpenAI alone spending ~$4B serving free users in 2025 — the most rigorous published quantification of frontier-lab subsidisation.
v23	LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks	Epoch AI	2025	Tracks state-of-the-art model prices across six benchmarks from 2022–2025, finding task-specific price-performance declines ranging from 9× to 900× per year, with the fastest declines post-January 2024 following DeepSeek and open-weight model competition.
v24	Inference Economics of Language Models	Epoch AI	2025	Deep-dives into the unit economics of LLM inference — compute, memory bandwidth, batching efficiency, and hardware utilisation — establishing that electricity is only 10–15% of GPU TCO while capital costs dominate, which sets a structural cost floor on token prices.
v25	How Persistent Is the Inference Cost Burden?	Epoch AI	2025	Examines whether inference cost burdens at frontier labs are structural or transient, finding that rising query complexity (reasoning chains, agentic loops) offsets hardware efficiency gains — directly addressing whether price-per-token declines will continue through 2028.

Blogs & Independent Thinkers

ID	Title	Outlet	Date	Significance
b1	The Unsustainable Economics of LLM APIs: Understanding the Coming Price Realignment	ScaleDown (tinyml.substack.com, Substack)	2024	Bottom-up hardware cost analysis concluding that LLM API providers absorb over 90% of true token costs, framing the current market as a VC-funded 'land-grab phase' structurally analogous to Uber's early subsidised pricing.
b2	The Cost of Inference: Running the Models	ScaleDown (tinyml.substack.com, Substack)	2024	Practitioner-level breakdown of GPU, energy, networking, cooling, and ops overhead that compose the true cost of serving a token, providing the most granular independent infrastructure accounting framework available publicly.
b3	Tokenomics 101: Navigating the Nuances of LLM Product Pricing	ScaleDown (tinyml.substack.com, Substack)	2024	Explains why input/output token price ratios reflect compute and memory bandwidth constraints rather than usage patterns, and quantifies how published API rates relate to underlying unit economics.
b4	The Economics of Building ML Products in the LLM Era	ScaleDown (tinyml.substack.com, Substack)	2024	Examines the total cost of ownership for product builders layering on top of frontier APIs, showing how token costs compound through retrieval, context, and agentic chains to produce effective per-query costs far above headline rates.
b5	The Price of Tokenmaxxing	Aspiring for Intelligence (Substack)	2025	Analyses how Anthropic's API pricing at scale challenges the foundational startup-layer assumption that foundation model costs would remain negligible, arguing the 'cheap token' era is ending for heavy agentic workloads.
b6	The Price Is Wrong	Aspiring for Intelligence (Substack)	2025	Investigates the structural gap between flat-subscription pricing and per-token API rates, arguing Anthropic was cross-subsidising heavy agentic users by more than 5x, a dynamic now forcing explicit pricing architecture decisions.
b7	Groq Inference Tokenomics: Speed, But At What Cost?	SemiAnalysis (newsletter.semianalysis.com, Substack)	2024-02	First-principles cost modelling of Groq's LPU architecture against H100 economics, establishing the benchmark methodology for comparing true cost-per-token across inference hardware generations.
b8	Inference Race To The Bottom - Make It Up On Volume?	SemiAnalysis (newsletter.semianalysis.com, Substack)	2024	Directly addresses whether commodity token prices can persist below true cost at scale, arguing aggressive price competition is structurally unsustainable without volume offsets that current demand does not yet guarantee.
b9	The Inference Cost Of Search Disruption – Large Language Model Cost Analysis	SemiAnalysis (newsletter.semianalysis.com, Substack)	2023	Landmark early analysis estimating what deploying GPT-4-class inference at Google Search scale would cost, establishing a cost-floor analysis that anchored subsequent independent discussion of the scale of lab subsidisation.
b10	DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-01	Forensic reconstruction of DeepSeek's true training compute costs and the implications for Western lab margins, directly testing how much of the apparent cost advantage is real versus accounting artefact.
b11	AMD vs NVIDIA Inference Benchmark: Who Wins? - Performance & Cost Per Million Tokens	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-05	Six-month empirical benchmark comparing hardware cost-per-token across real workloads, revealing a 15x cost reduction from Hopper to Blackwell generation and nuanced workload-specific GPU advantage patterns.
b12	InferenceMAX™: Open Source Inference Benchmarking	SemiAnalysis (newsletter.semianalysis.com, Substack)	2025-10	Introduces an independent TCO-per-million-token benchmark — the first to measure total cost of compute across diverse model sizes and real-world scenarios — establishing a replicable methodology for ongoing cost trajectory analysis.
b13	Mythos, Muse, and the Opportunity Cost of Compute	Stratechery (Ben Thompson)	2026	Argues that AI has re-introduced meaningful marginal costs into tech after two decades of near-zero marginal cost software, with direct implications for why token prices have a structural floor and why current pricing is strategically rather than economically motivated.
b14	AI Promise and Chip Precariousness	Stratechery (Ben Thompson)	2025	Examines how DeepSeek and open-weight models create persistent structural pricing pressure, arguing that sustainable margins require either a hardware cost advantage (Google's TPU edge) or aggregation, not just capability differentiation.
b15	Rapidus, The End of Economic Rationality, AI Disruption	Stratechery (Ben Thompson)	2024	Argues that AI capex commitments have entered a regime where strategic imperatives suspend economic rationality, contextualising why labs sustain large operating losses to hold developer market position.
b16	Observations About LLM Inference Pricing	LessWrong	2024	Empirical analysis showing 10x price dispersion for identical open-weight models across providers, inferring that software stack optimisation (batching, kernel efficiency, speculative decoding) drives more of the actual cost variance than hardware alone.
b17	Simon Willison on llm-pricing (tag archive)	Simon Willison's Weblog	2023	Running empirical record of every major LLM pricing event from GPT-4's launch through 2026, with practitioner cost benchmarks (e.g. captioning 68,000 images for $1.68 with Gemini Flash) documenting the ~150x price drop with concrete real-world examples.
b18	Welcome to LLMflation — LLM inference cost is going down fast	Andreessen Horowitz (a16z)	2024-11	Coins 'LLMflation' and quantifies a 10x annual cost decline for equivalent-performance inference over three years — from $60/M tokens in 2021 to $0.06/M by late 2024 — the most-cited single data point in independent discourse on the price-collapse rate.
b19	How persistent is the inference cost burden?	Epoch AI (Substack)	2025	Analyses whether inference costs as a share of lab revenues are structural or transitional, estimating OpenAI's 2024 inference compute spend and modelling future cost burden under different algorithmic efficiency trajectories.
b20	How much does it cost to train frontier AI models?	Epoch AI	2024	Quantifies that frontier model training costs are growing 2–3x per year and projects the largest runs crossing $1 billion by 2027, directly addressing how training capex amortises into per-token inference pricing and why apparent API prices understate true costs.
b21	LLM inference prices have fallen rapidly but unequally across tasks	Epoch AI	2025	Demonstrates that inference price decline rates range from 9x to 900x per year depending on capability tier, with frontier reasoning models holding price stable while commodity models collapsed — the key bifurcation story of 2024–2025.
b22	The Jevons Paradox in AI Infrastructure: DeepSeek Efficiency Breakthroughs to Drive Energy Demand	AI Proem (Substack)	2025	Applies Jevons Paradox to argue that DeepSeek-style efficiency gains will expand total AI compute demand and energy consumption rather than reduce them, establishing a rising infrastructure cost floor that will eventually pressure token prices upward.
b23	The Jevons Paradox in AI: Why Efficiency Creates More Demand	The Substrate (Substack)	2025	Documents that per-token prices fell a thousandfold in three years yet total enterprise AI spending surged 320% in 2025, with enterprise inference spend reaching $37B — empirically confirming that Jevons effects dominate price reductions at the market level.
b24	AI agents are about to get more expensive	Tiny Empires (Substack)	2025	Argues that multi-step agentic workflows break the 'cheap token' assumption by multiplying token consumption multiplicatively, making true total cost of ownership for agentic AI materially higher than per-token sticker prices suggest.
b25	AI Pricing Architecture Is Now Strategy	SaaS Intelligence (Substack)	2025	Frames token-based API pricing as a strategic weapon for developer lock-in and market share capture, drawing direct parallels to early AWS subsidised cloud pricing as a land-grab before margin normalisation.