Research · Summary

Back to sweep

Research sweep · deep · 2023 – 2026

Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

  • financial
  • frontier
  • academic
  • vc
  • blogs

Synthesised 2026-04-19

AI Token Pricing vs. True Total Cost of Ownership: A Research Synthesis (2023–2026)

Overview

The price users pay per AI token has collapsed by 85–95% since GPT-4 launched in March 2023. Frontier input tokens that cost $30–36 per million at launch traded at $1.75–5 per million by early 2026, while commodity models fell below $0.10 per million. This decline represents the fastest sustained price drop in the history of enterprise technology, outpacing even the aggressive cloud pricing wars of the late 2000s. Yet beneath this headline deflation sits a structural reality that financial markets, enterprise buyers, and policy analysts are only beginning to confront: published API prices bear little relationship to what it actually costs to serve a token, and the gap is being funded by investor capital rather than operating economics.

The defining shift of the past 18 months has been the recognition that frontier AI inference is a deliberately subsidized market. OpenAI's leaked financial documents revealed $143 billion in projected cumulative cash outflow from 2024 to 2029, with inference costs at $8.4 billion in 2025 rising to $14.1 billion in 2026 against gross margins held below 35%. Anthropic cut Claude Opus pricing by 67% in a single 2025 announcement. DeepSeek's January 2025 emergence demonstrated frontier-equivalent performance at 5–10% of incumbent cost structures, forcing repricing across the industry. These are not efficiency gains passed through to consumers; they are strategic losses absorbed by balance sheets backed by over $650 billion in combined Big Tech AI capex for 2026 alone.

Sources: Fortune (2025) (); Bloomberg (2026) (); PYMNTS (2025) (); Bloomberg (2025) ()

The analytical challenge for 2026 is that two opposing forces are accelerating simultaneously. Hardware efficiency continues to improve at rates exceeding Moore's Law, with NVIDIA's Blackwell architecture delivering up to 15× inference performance improvement over H100. Algorithmic optimizations (quantization, speculative decoding, mixture-of-experts architectures) compound these gains. Yet infrastructure costs are rising, not falling: data center construction timelines remain at two years, power grid constraints are binding in key regions, and energy demand from AI queries runs 10× that of traditional search. The question is no longer whether token prices will eventually normalize toward cost, but when, and at what level the floor will settle.

Sources: NVIDIA (official) (2025) (); Bloomberg Intelligence (2025) ()

Key Findings

1. Subsidization is quantifiable and material, not speculative.

OpenAI spent an estimated $8.4 billion on inference in 2025, with projections rising to $14.1 billion in 2026. Total cumulative losses through 2024 reached $5 billion. Centific documented a 25× gap between flat subscription pricing and actual API cost for heavy users. Independent hardware modeling by ScaleDown concluded that providers absorb over 90% of true token cost. The GitHub Copilot case, with $10/month subscriptions against $30/month compute costs, has become the canonical illustration across analyst coverage. This is not a disputed empirical question: the subsidies are real, documented, and funded by equity financing rather than cross-subsidization from profitable operations.

Sources: Fortune (2025) (); Centific (2025) (); ScaleDown (tinyml.substack.com, Substack) (2024) ()

2. The AWS parallel is explicit but imperfect.

Frontier labs are replicating Amazon's early cloud strategy: price below cost to capture developer workflow, create switching costs through API integration and fine-tuned dependencies, and normalize pricing once market position is unassailable. AWS sustained this approach from 2006 to roughly 2012–2014. The critical difference is capital intensity: AWS early subsidies measured in tens of millions of dollars; OpenAI's 2024–2025 inference subsidies measure in billions. AWS also achieved positive unit economics for large customers even in early years, meaning its subsidy was strategic rather than structural. Frontier AI inference appears genuinely loss-making at scale, with profitability targeted for 2030 by OpenAI's own projections.

Sources: Fortune (2025) (); Stratechery (Ben Thompson) (2024) ()

3. Enterprise TCO diverges dramatically from API sticker price.

McKinsey and Gartner consistently document that API costs represent only 15–30% of enterprise deployment TCO for complex applications. The remainder comprises data preparation, prompt engineering, fine-tuning, security and compliance auditing, integration and orchestration overhead, human-in-the-loop validation, and RAG infrastructure. The a16z enterprise CIO survey found 84% of respondents reporting AI-related margin erosion despite falling token prices. Financial analysts modeling AI ROI solely on declining token prices systematically underestimate deployment costs.

Sources: Gartner (2026) (); Andreessen Horowitz (a16z) (2025) ()

4. Energy is emerging as the binding cost floor.

Bloomberg's April 2026 reporting on OpenAI pausing its UK Stargate data center explicitly named energy cost as the constraint. Academic research projects AI data center electricity demand reaching 9–12% of total US energy by 2030. An April 2026 arXiv paper documented that geographic clustering of AI data centers creates nonlinear regional grid stress and capacity market price spikes of 10× or more. Unlike compute costs, which follow steep improvement curves, energy and cooling costs are rising in absolute terms as data center density increases to 100+ kW per rack with projections above 1 MW. This creates an asymmetric cost structure that algorithmic efficiency cannot fully offset.

Sources: Bloomberg (2026) (); Bloomberg Intelligence (2025) ()

5. Training cost amortization remains underweighted in pricing analysis.

Epoch AI established that frontier training costs grow approximately 2.4× per year, with the largest runs projected to exceed $1 billion by 2027. GPT-4-era training was estimated at $50–100 million; frontier 2025–2026 runs cost $500 million to $1 billion or more. This sunk cost must be amortized over an 18–24 month useful life before the next generation supersedes it. At lower-than-projected inference volumes, the drag on margins is significant, and it does not shrink proportionally with per-run efficiency gains as training scale continues to escalate.

Sources: arXiv (preprint) (2024) (); Epoch AI (2024) ()

6. Jevons paradox is the central analytical challenge for price trajectory projections.

Cheaper tokens produce proportionally more demand, absorbing efficiency gains and sustaining energy cost pressure. Epoch AI data shows inference revenue at major labs growing 3× per year even as per-token prices decline, because volume more than compensates. The a16z/OpenRouter study covering 100 trillion real tokens found average coding prompts exceeding 20,000 tokens, with agentic inference as the fastest-growing workload. Meta raised 2025 AI capex by 50% after DeepSeek's efficiency announcement rather than cutting it. Enterprise AI inference spend reached $37 billion in 2025, up 320% year-on-year, despite thousandfold price declines.

Sources: Andreessen Horowitz (a16z) (2026) (); AI Proem (Substack) (2025) (); The Substrate (Substack) (2025) ()

7. Price decline rates vary dramatically by capability tier.

Epoch AI's empirical tracking shows inference prices for commodity performance tiers falling 9× to 900× per year, with the widest variance by benchmark type. Frontier reasoning models have not followed commodity curves; their pricing has remained substantially above the collapsing commodity baseline. This creates a two-tier market: cheap commodity inference for routine tasks and premium frontier-model access for the most capable systems. The spread between tiers has widened, not narrowed, since 2024.

Sources: Epoch AI (2025) (); Epoch AI (2025) ()

8. Capability growth compounds the cost-per-useful-work equation.

METR's research documents that AI time horizons (the task-completion duration at which models achieve 50% success) doubled approximately every 7 months through early 2025. Combined with per-token cost reductions, the implied cost-per-unit-of-human-equivalent-work is falling faster than any prior productivity technology. This has a paradoxical effect: agentic deployments become more economically attractive, driving aggregate token demand upward through Jevons dynamics while individual tasks become cheaper.

Sources: METR (Model Evaluation & Threat Research) (2025) (); METR (Model Evaluation & Threat Research) (2026) ()

9. The capital stack required to sustain subsidization is historically unprecedented.

Combined Big Tech AI capex reached $650 billion for 2026 alone. Bloomberg characterized the associated debt financing as a $3 trillion market event. Bain calculates that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand, alongside an $800 billion infrastructure shortfall. Goldman Sachs' 2024 report questioning whether $1 trillion in AI investment would generate adequate returns remains the reference document for financial market skepticism.

Sources: Bloomberg (2026) (); Bloomberg (2026) (); Bain & Company (2025) (); Goldman Sachs (2024) ()

Evidence & Data

The most precise cost structure data comes from bottom-up hardware analysis. SemiAnalysis established that H100 inference at typical utilization produces a cost floor of roughly $0.50–2.00 per million tokens depending on model size, well above the $0.07–0.30 prices that commodity models fetched by late 2024. NVIDIA's Blackwell architecture results show hardware efficiency gains of up to 15× versus H100. Google's TPU v6e (Trillium) delivered 4× improvement in inference performance per dollar. ARK Invest pegs overall inference cost declines at approximately 95% annually using SemiAnalysis benchmark data.

Sources: SemiAnalysis (newsletter.semianalysis.com, Substack) (2023) (); NVIDIA (official) (2025) (); Google Cloud (official) (2024) (); ARK Investment Management (2025) ()

On the price trajectory, the historical record is unambiguous: GPT-4 launched at $30 per million input tokens in March 2023; GPT-4 Turbo dropped to $10 per million in November 2023; GPT-4o reached $5 per million in May 2024; commodity models fell under $0.10 per million by late 2024; DeepSeek V3 priced at $0.14 per million in December 2024. By early 2026, flagship models clustered around $1.75–5 per million input tokens. Epoch AI's formal analysis found cost reductions averaging 10× per year for a given performance level, with benchmark-specific variance ranging from 9× to 900× per year.

Sources: Simon Willison's Weblog (2023) (); Andreessen Horowitz (a16z) (2024) (); Epoch AI (2025) ()

Infrastructure spend figures anchor the capital intensity argument. McKinsey projects $3.7–7.9 trillion in global data center capex through 2030 (base case $5.2 trillion). Bloomberg Intelligence forecasts Big Tech 2025 capex may hit $200 billion as generative AI demand booms. The accelerator market is projected to exceed $600 billion by 2033. Gartner forecasts $2.52 trillion in total AI spending for 2026, up 44% year-on-year.

Sources: McKinsey Global Institute (2025) (); Bloomberg Intelligence (2025) (); Bloomberg Intelligence (2025) (); Gartner (2026) ()

Energy cost data provides the clearest signal of structural cost floors. Academic research documents that inference accounts for over 90% of LLM lifecycle energy consumption, inverting the common framing that training dominates. TokenPowerBench established the first dedicated methodology for measuring joules per token across batch sizes, context lengths, and quantization levels. US DOE projects data center electricity consumption tripling by 2028. AI data centers are projected to consume 9–12% of total US electricity by 2030.

Sources: Bloomberg Intelligence (2025) ()

The on-premise breakeven analysis offers the most direct bound on subsidization magnitude. Academic research finds on-premise inference can break even against commercial API pricing in 0.3–3 months for moderate workloads, empirically demonstrating how far below true serving cost commercial API prices are set.

Signals & Tensions

The commodity-frontier price spread is widening, not converging. While commodity models have seen price collapse to sub-$0.10 per million tokens, frontier reasoning models have maintained substantially higher pricing. This suggests a durable two-tier market structure rather than uniform commoditization. Labs appear to be using commodity pricing for developer acquisition while preserving margins on frontier capabilities. The tension is whether open-weight models (DeepSeek, Llama) will eventually compress frontier pricing, or whether capability gaps will sustain premium tiers indefinitely.

Energy cost projections remain poorly integrated into pricing models. Most analyst coverage focuses on GPU and hardware cost trajectories while underweighting energy as a structural input. The April 2026 finding that regional grid concentration creates nonlinear capacity market price spikes is not reflected in standard per-token pricing frameworks. Labs securing long-term power purchase agreements in low-cost grid regions (nuclear-backed Midwest, hydro-backed Pacific Northwest) may gain structural advantages invisible in hardware-only analyses.

Sources: Bloomberg (2026) ()

The consensus on subsidization duration is weakening. Early analyst framing (2023–2024) assumed pricing would normalize within 2–3 years as the AWS playbook predicted. OpenAI's financial disclosures targeting profitability in 2030, combined with $143 billion in projected cash outflow through 2029, suggest the window has extended. Sequoia's 2024 warning of a $600 billion revenue shortfall has not resolved; the gap has widened. The question of investor patience is now explicit in financial press coverage.

Sources: Sequoia Capital (2024) (); Fortune (2025) ()

Regulatory cost overhead is underreported. EU AI Act compliance, NIST AI RMF adoption, and emerging financial sector governance requirements add materially to enterprise TCO. These costs do not decline with token price and may grow through 2028. The compliance and audit burden is largely absent from VC and analyst projections focused on hardware economics.

Open-weight operational variance undermines simple cost-floor arguments. LessWrong documentation shows open-weight model prices varying 10× across providers for identical weights. This implies operational excellence (batching, kernel optimization, memory management) is the primary near-term differentiator, complicating hardware-centric cost-floor projections.

Sources: LessWrong (2024) ()

Open Questions

When does marginal inference become cash-flow positive at frontier labs? OpenAI targets 2030. Anthropic has not disclosed. The timeline depends on whether training cost growth (2.4× per year) outpaces inference efficiency gains (10× per year for equivalent performance), and whether revenue scales faster than infrastructure commitments.

What is the true energy cost floor for frontier inference? Academic research establishes that inference dominates lifecycle energy, but the conversion to dollars-per-token varies by geography, power contract structure, and cooling technology. No public framework integrates regional grid economics with per-token pricing.

Will agentic workloads absorb efficiency gains entirely? The Jevons paradox literature documents historical rebounds, but the elasticity of AI token demand remains empirically uncertain. If multi-agent loops and extended reasoning chains scale proportionally with cost reduction, total inference spending may never decline even as unit costs fall.

How durable is the frontier-commodity price spread? Open-weight competition (DeepSeek, Qwen, Llama) has compressed commodity pricing but has not yet reached frontier reasoning capabilities. Whether this spread represents a durable structural feature or a temporary gap depends on training cost dynamics and whether open-weight labs can sustain frontier-scale investment.

What happens when VC patience exhausts? The historical AWS parallel suggests markets tolerate 6–8 years of subsidization before demanding margin normalization. AI labs are now in year 3–4 of material inference subsidization. The trigger conditions for pricing normalization remain undefined.

How will geographic grid constraints reshape competitive positioning? April 2026 research shows AI data center clustering creates regional capacity market stress. Labs with early access to low-cost, high-capacity grid regions may gain cost advantages that dwarf hardware efficiency differences. The distribution of these advantages is not well mapped.

What regulatory cost burden will materialize by 2028? AI Act, sectoral governance requirements, and audit obligations are accumulating, but quantified TCO impact estimates do not exist. Enterprise buyers are pricing this risk as uncertainty rather than cost, potentially underestimating deployment economics.


![[sources-ai-token-pricing-vs-true-total-cost-of-ownership-f]]


Sources

Summary: ↑ Back to summary


Financial Press

ID Title Outlet Date Significance
f1 How Much Is Big Tech Spending on AI Computing? A Staggering $650 Billion in 2026 Bloomberg 2026-02 Definitive Bloomberg News quantification of 2026 hyperscaler AI capex at $650B, establishing the scale of infrastructure investment that underpins current token pricing subsidies.
f2 The $3 Trillion AI Data Center Build-Out Becomes All-Consuming For Debt Markets Bloomberg 2026-02 Bloomberg's deep-dive into debt market financing of AI infrastructure, revealing the financial mechanics behind how data-centre construction costs are being funded and how that cost ultimately flows through to inference economics.
f3 OpenAI Pauses Stargate UK Data Center Citing Energy Costs Bloomberg 2026-04 Illustrates that energy cost constraints are already forcing project-level decisions at the frontier lab level, confirming that power is emerging as a binding cost floor for token pricing.
f4 AI Spending Boom Shifts From Training Models to Running Them Bloomberg 2025-04 Pivotal Bloomberg newsletter piece documenting the structural shift in AI capex from model training to inference workloads, the key transition defining the 2025–2026 cost and pricing landscape.
f5 Why AI Bubble Concerns Loom as OpenAI, Microsoft, Meta Ramp Up Spending Bloomberg 2025-11 Bloomberg synthesises mounting analyst concern that AI infrastructure investment is outpacing monetisation, directly relevant to whether current token prices can ever cover true costs.
f6 OpenAI Says Spending to Rise to $115B Through 2029 Bloomberg 2025-09 Bloomberg reporting on OpenAI's internal spending roadmap, confirming that compute cost trajectories are projected to rise sharply even as token prices are cut, widening the subsidy gap.
f7 Watch AI Cost Assumptions Challenged Bloomberg 2025-01 Bloomberg live coverage on the day of DeepSeek's market shock, capturing real-time financial market reaction to a rival model achieving near-frontier performance at a fraction of the cost, directly challenging incumbent pricing assumptions.
f8 OpenAI CFO Thinks Business Users Will Pay Thousands For AI Software Bloomberg 2024-12 Direct executive commentary from OpenAI's CFO on the enterprise pricing strategy, revealing the planned shift toward high-ARPU subscription models as an alternative to per-token revenue to fund infrastructure.
f9 Microsoft Sets Expensive Price Tag for New Corporate AI Products Bloomberg 2023-07 Early Bloomberg benchmarking of enterprise AI product pricing (Microsoft Copilot at $30/user/month), providing a 2023 baseline to measure how enterprise AI pricing models have evolved.
f10 AI Inferencing at Crossroads Bloomberg Intelligence 2025 Bloomberg Intelligence analysis of inference as the critical commercial battleground, detailing how model distillation and quantisation are reducing per-token costs while demand scaling offsets margin improvements.
f11 Big Tech 2025 Capex May Hit $200 Billion as Gen-AI Demand Booms Bloomberg Intelligence 2025 Bloomberg Intelligence capex projection establishing that 2025 hyperscaler infrastructure spend — the cost base that subsidises token pricing — would reach $200B, up sharply from prior years.
f12 AI Accelerator Market Looks Set to Exceed $600 Billion by 2033 Bloomberg Intelligence 2025 Bloomberg Intelligence market-sizing of the AI accelerator chip ecosystem ($116B in 2024 to $604B by 2033), quantifying the hardware cost trajectory underlying all token pricing models.
f13 AI Is a Game Changer for Power Demand Bloomberg Intelligence 2025 Bloomberg Intelligence analysis of how AI data centres are transforming energy markets, with generative AI queries consuming up to 10x the energy of traditional searches, establishing energy as a rising structural cost component.
f14 Gen AI: Too Much Spend, Too Little Benefit? Goldman Sachs 2024-06 Goldman Sachs' most-cited AI sceptic report, with head of global equity research questioning whether $1T in AI infrastructure can generate adequate returns; a key reference point for the 'cost vs. benefit' debate in financial markets.
f15 Will the $1 Trillion of Generative AI Investment Pay Off? Goldman Sachs 2024 Goldman Sachs investment research framing the core financial question around AI infrastructure: whether the capital cycle is commercially justifiable, directly informing how analysts assess the sustainability of below-cost token pricing.
f16 Why AI Companies May Invest More Than $500 Billion in 2026 Goldman Sachs 2026 Goldman Sachs' most current projection on AI infrastructure spending, providing a 2026 financial-market perspective on whether investment momentum is sustainable and what it implies for token cost floors.
f17 The Cost of Compute: A $7 Trillion Race to Scale Data Centers McKinsey & Company 2025 McKinsey's comprehensive bottom-up analysis of data centre cost structure, projecting $5.2T required investment through 2030, and decomposing the build cost into land, power, cooling, and compute components.
f18 The New Economics of Enterprise Technology in an AI World McKinsey & Company 2025 McKinsey's enterprise-facing analysis of how AI shifts IT spending from capex to opex, with FinOps and token-level cost visibility emerging as critical for managing true AI deployment TCO beyond API sticker prices.
f19 LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks Epoch AI 2025 The most rigorous empirical tracking of token price declines across performance tiers, documenting 9x–900x annual price drops depending on task and showing that frontier reasoning models have not followed commodity price trends.
f20 Inference Economics of Language Models Epoch AI 2024 Epoch AI's foundational decomposition of what drives LLM inference costs — hardware utilisation, model size, batch size, memory bandwidth — providing the analytical framework cited by financial analysts evaluating token pricing sustainability.
f21 AI Datacenter Energy Dilemma — Race for AI Datacenter Space SemiAnalysis 2024 SemiAnalysis's deep technical analysis of data-centre power constraints as a structural cost floor for AI inference, widely cited in financial press as the authoritative bottom-up view on infrastructure economics.
f22 Groq Inference Tokenomics: Speed, But At What Cost? SemiAnalysis 2024 SemiAnalysis cost-per-token breakdown for specialised inference hardware, quantifying real economics of serving tokens and demonstrating the gap between cloud-provider pricing and actual hardware cost.
f23 OpenAI Says It Plans to Report Stunning Annual Losses Through 2028 — and Then Turn Wildly Profitable Just Two Years Later Fortune 2025-11 Fortune's reporting on leaked OpenAI financial projections showing $44B cumulative losses before 2029 profitability — the definitive document source for quantifying how much investor capital is subsidising current token prices.
f24 Perspective: AI Demand Is Inflated, and Only Anthropic Is Being Realistic CNBC 2026-04 Most recent (April 2026) financial media critique of AI demand assumptions and token consumption projections, with direct commentary on Anthropic's more conservative pricing and demand forecasting relative to OpenAI and Nvidia.
f25 AI Training Costs Are Improving at 50x the Speed of Moore's Law ARK Invest 2023 ARK Invest's Wright's Law application to AI compute, projecting that AI training and inference costs decline at 50x the pace of Moore's Law — the bullish analytical counterpoint to Goldman Sachs' scepticism on AI cost trajectories.

Frontier Lab & Model News

ID Title Outlet Date Significance
t1 METR's GPT-4.5 Pre-Deployment Evaluations METR (Model Evaluation & Threat Research) 2025-02 Official METR pre-deployment autonomy evaluation of GPT-4.5, finding capabilities between GPT-4o and o1 and assessing risk level relative to existing frontier models.
t2 Details about METR's Preliminary Evaluation of Claude 3.7 METR (Model Evaluation & Threat Research) 2025-04 Pre-deployment autonomy assessment of Claude 3.7 Sonnet, noting impressive AI R&D capabilities on RE-Bench but no evidence of dangerous-level autonomous capabilities.
t3 Details about METR's Evaluation of OpenAI GPT-5 METR (Model Evaluation & Threat Research) 2025-05 METR's autonomy evaluation of OpenAI's flagship GPT-5 model, providing the most current public capability benchmarking for the frontier's leading model.
t4 Details about METR's Preliminary Evaluation of GPT-4o METR (Model Evaluation & Threat Research) 2024-05 Baseline METR autonomy evaluation for GPT-4o, establishing a reference point against which later models' capability escalations are measured.
t5 Task-Completion Time Horizons of Frontier AI Models — Time Horizon 1.1 METR (Model Evaluation & Threat Research) 2026-01 METR's updated time-horizon dataset showing frontier model autonomous task-completion window doubling roughly every 7 months since 2019, with an expanded task suite giving tighter estimates at longer horizons.
t6 Measuring AI Ability to Complete Long Tasks METR (Model Evaluation & Threat Research) 2025-03 Introduces METR's methodology for quantifying how long AI agents can sustain productive autonomous work, directly informing the inference-cost implications of extended agentic deployments.
t7 Details about METR's Preliminary Evaluation of DeepSeek and Qwen Models METR (Model Evaluation & Threat Research) 2025-07 Finds mid-2025 DeepSeek autonomous capability levels comparable to late-2024 frontier models, highlighting how cost-efficient open-weight models are closing the autonomy gap.
t8 Introducing Claude 3.5 Sonnet Anthropic 2024-06 Official launch announcement establishing Claude 3.5 Sonnet as Anthropic's price-performance flagship, priced at $3/$15 per million tokens — significantly undercutting Claude 3 Opus at $15/$75.
t9 Model Card Addendum: Claude 3.5 Haiku and Upgraded Claude 3.5 Sonnet Anthropic 2024-10 Official Anthropic model card documenting safety evaluations, capability benchmarks, and technical specifications for the October 2024 Claude 3.5 refresh — a primary technical disclosure.
t10 Google and Anthropic Drop AI Prices and Release New Models PYMNTS 2025-05 Documents the coordinated 2025 pricing cuts by Google (Gemini) and Anthropic (Claude Opus 4.5, price cut by 67%), illustrating competitive subsidisation dynamics between frontier labs.
t11 OpenAI Has Spent $12B on Inference with Microsoft: Report The Register 2025-11 Reports OpenAI's cumulative inference spend of $12B on Azure, exposing the massive infrastructure subsidy underpinning user-facing token prices.
t12 OpenAI Training and Inference Costs Could Reach $7bn for 2024, AI Startup Set to Lose $5bn Data Center Dynamics 2024-09 Key financial disclosure showing OpenAI's 2024 compute cost structure — $7B in training and inference against $3.7B revenue — quantifying the scale of below-cost token pricing.
t13 Exclusive: Here's How Much OpenAI Spends on Inference and Its Revenue Share With Microsoft Where's Your Ed At (Ed Zitron) 2025-05 Detailed breakdown of OpenAI's leaked internal financials, showing inference costs at $8.4B in 2025 — 66% from paying users — with projections rising to $14.1B in 2026.
t14 OpenAI Faces Financial Growing Pains, Spending Double Its Revenue DeepLearning.AI – The Batch 2024-10 Concise summary of OpenAI's loss trajectory ($540M in 2022 → $5B in 2024), contextualising why user-facing token prices remain far below true cost.
t15 The Rising Costs of Training Frontier AI Models arXiv (preprint) 2024-05 Academic analysis quantifying the exponential escalation in frontier model training costs, providing the cost-amortisation context for why labs price tokens below marginal cost.
t16 AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design arXiv (preprint) 2026-03 Proposes a formal framework for AI token pricing as a tradeable commodity, analysing the structural forces — lab subsidisation, demand elasticity, and market power — driving current API pricing.
t17 Photons = Tokens: The Physics of AI and the Economics of Knowledge arXiv (preprint) 2026-03 Formalises the Structural Jevons Paradox in AI: as unit token costs fall, firms redesign agent architectures to consume dramatically more compute via deeper reasoning loops and larger context windows.
t18 InferenceMAX™: Open Source Inference Benchmarking SemiAnalysis 2025-06 SemiAnalysis's open-source benchmark showing NVIDIA Blackwell delivering 15× lower cost per million tokens versus prior generation, setting the hardware efficiency baseline for 2025–2026 pricing floors.
t19 AI Datacenter Energy Dilemma — Race for AI Datacenter Space SemiAnalysis 2024-12 Detailed infrastructure analysis from SemiAnalysis on power constraints, data-centre construction timelines, and energy costs as the principal rising-cost vector offsetting hardware efficiency gains.
t20 Google TPUv7: The 900lb Gorilla In the Room SemiAnalysis 2025-08 Deep technical analysis of Google's latest proprietary TPU, showing how vertical compute integration gives Google a structural cost advantage in Gemini inference pricing versus GPU-dependent rivals.
t21 Introducing Cloud TPU v5p and AI Hypercomputer Google Cloud (official) 2023-12 Google's official announcement of TPU v5p infrastructure powering Gemini training, establishing the proprietary compute stack that underpins Google's inference cost economics.
t22 Trillium TPU Is GA Google Cloud (official) 2024-11 Announces general availability of Trillium (TPU v6e), offering 4× better performance-per-dollar for inference versus v5e and used to train Gemini 2.0 — quantifying Google's hardware efficiency edge.
t23 NVIDIA Blackwell Raises Bar in New InferenceMAX Benchmarks, Delivering Unmatched Performance and Lowest Cost Per Token NVIDIA (official) 2025-07 Official NVIDIA benchmark results showing Blackwell architecture's cost-per-token leadership, directly informing the hardware cost floor for frontier labs running GPU-based inference.
t24 Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters NVIDIA (official) 2025-03 NVIDIA's TCO framework for 'AI factories,' arguing that total cost of ownership — not GPU price — governs real inference economics, encompassing compute, networking, cooling, and utilisation.
t25 The 25× Subscription Trap: Why Frontier Labs Can No Longer Subsidize Your AI Centific 2025-09 Documents the 25× gap between flat subscription fees and actual API cost for heavy users, providing concrete evidence of the scale of cross-subsidisation in frontier lab pricing models.

VC & Analyst Reports

ID Title Outlet Date Significance
v1 State of AI: An Empirical 100 Trillion Token Study with OpenRouter Andreessen Horowitz (a16z) 2026-01 Empirical study of 100T tokens routed via OpenRouter reveals that agentic inference is the fastest-growing use pattern and that developers overwhelmingly optimise for quality over price, with Claude holding ~60% of coding workloads at 20K+ token average prompts — directly illustrating Jevons paradox at the token level.
v2 AI Is Driving A Shift Towards Outcome-Based Pricing (December 2024 Enterprise Newsletter) Andreessen Horowitz (a16z) 2024-12 Argues that per-token pricing is giving way to outcome-based pricing as AI costs scale, but finds CIOs remain uncomfortable with outcome metrics — a key signal that token-cost opacity is migrating into enterprise contract design.
v3 How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 Andreessen Horowitz (a16z) 2025 Survey of 100 enterprise CIOs finds 80% missed AI infrastructure cost forecasts by more than 25% and 84% report margin erosion tied to AI workloads, establishing that user-facing token prices grossly understate true enterprise TCO.
v4 CFO Roundtable: AI Growth, Pricing, and Forecasting (June 2025 Fintech Newsletter) Andreessen Horowitz (a16z) 2025-06 CFO-level discussion on AI unit economics reveals that every token processed is a direct variable cost, and that the newest reasoning models still command relatively high costs despite commodity-model price compression.
v5 AI's $600B Question Sequoia Capital 2024-06 David Cahn's landmark framework quantifies the annual revenue gap between AI infrastructure investment and actual AI-ecosystem revenue at ~$600B, calculates GPU costs as exactly half of AI data-centre TCO, and explicitly flags rapid GPU depreciation as a structural risk to lab economics.
v6 AI is Now Shovel Ready Sequoia Capital 2024-12 Designates 2025 as the 'Year of the Data Center,' detailing that average AI data-centre construction takes ~2 years, that Amazon committed $50B+ to new builds in H1 2024, and that capital allocation risk from long lead times is a primary structural constraint on AI supply economics.
v7 AI in 2025: Building Blocks Firmly in Place Sequoia Capital 2025-01 Annual outlook positions 2025 as an execution year where infrastructure build-out transitions from deal-signing to physical deployment, with cloud service providers competing on GPU cluster scale and pricing as the primary near-term battleground.
v8 AI in 2026: A Tale of Two AIs Sequoia Capital 2026-01 Identifies a bifurcation between commoditised inference and frontier reasoning models, framing the divergence as a structural price-floor dynamic where frontier capability commands premium pricing while commodity models race toward near-zero marginal cost.
v9 The Cost of Compute: A $7 Trillion Race to Scale Data Centers McKinsey Global Institute 2025 Projects $3.7–$7.9 trillion in global data-centre capex through 2030 across three demand scenarios, with the base case at $5.2 trillion, and allocates ~60% of spend to computing hardware, ~25% to power and cooling, and ~15% to construction — the most comprehensive public cost-stack decomposition available.
v10 Who's Funding the AI Data Center Boom? McKinsey Global Institute 2025 Examines the financing structure behind AI data-centre buildout, clarifying that hyperscaler balance sheets, sovereign wealth funds, and private credit are the three capital pools underwriting infrastructure that token prices must eventually recoup.
v11 Issue Brief: AI Infrastructure McKinsey Global Institute 2025 Frames AI infrastructure as an 'AI factory' model — data and electricity as inputs, tokens and insights as outputs — directly linking compute capex to revenue generation and articulating the economic logic that will ultimately drive token price normalisation.
v12 Beyond Compute: Infrastructure That Powers and Cools AI Data Centers McKinsey Global Institute 2025 Analyses the non-compute TCO components (power, cooling, backup generation, physical plant) that are often invisible in quoted token prices, projecting 200 incremental GW of AI-related capacity required in the accelerated scenario and flagging energy as a rising, not falling, cost component.
v13 Token Economics, Physical AI, and Beyond: McKinsey Previews NVIDIA GTC McKinsey Global Institute 2025-03 McKinsey's Chris Smith explicitly adopts 'token economics' as an analytical unit, signalling that major strategy consultancies have shifted from cloud-hour pricing to per-token unit economics as the primary framework for AI infrastructure ROI analysis.
v14 Technology Report 2025: $2 Trillion in New Revenue Needed to Fund AI's Scaling Trend Bain & Company 2025 Bain's headline finding that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand is the single most cited cost-recovery gap figure in 2025 analyst literature, and directly implies sustained lab subsidisation until that gap closes.
v15 How Can We Meet AI's Insatiable Demand for Compute Power? Bain & Company 2025 Quantifies that AI compute demand is growing at more than twice the rate of Moore's Law and projects a global $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud and AI data centres.
v16 AI's Trillion-Dollar Opportunity (Global Technology Report 2024) Bain & Company 2024 Bain's 2024 baseline report that documents unprecedented GenAI adoption speed despite cost roadblocks, establishing the trajectory against which the 2025 $2T gap estimate is benchmarked.
v17 Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026 Gartner 2026-01 Gartner's official forecast of $2.52 trillion in worldwide AI spending for 2026 — a 44% YoY increase — provides the most widely cited market-size anchor for contextualising token-price economics against total infrastructure outlays.
v18 Strategic Predictions for 2026: How AI's Underestimated Influence Is Reshaping Business Gartner 2025-10 Gartner's top strategic predictions for 2026 and beyond cover AI agent proliferation, agentic spending intermediation, and enterprise cost displacement — providing the Technology Radar framing for how token-cost trajectory intersects with enterprise software budgets.
v19 Gartner Survey Finds 54% of Infrastructure & Operations Leaders Are Adopting AI to Cut Costs Gartner 2025-10 Survey evidence that more than half of I&O leaders view AI primarily as a cost-reduction tool, creating a circular dynamic where AI's cost is justified by AI's cost savings — a framing that shapes enterprise willingness to absorb rising token bills.
v20 The State of AI Infrastructure: Demand, Costs, and Custom Silicon ARK Investment Management 2025-12 Using SemiAnalysis InferenceMax benchmarks, ARK calculates that inference costs for capable models are falling at ~95% annually, outpacing the ~75% annual training cost decline, and identifies custom silicon (Trainium, TPU, MTIA) as the next structural cost lever hyperscalers are deploying to reduce Nvidia dependence.
v21 AI Will Determine the Future of Software and Cloud Spending ARK Investment Management 2025 ARK projects global data-centre systems investment growing at 30%+ annually to reach $653B in 2026, with AI infrastructure spend tripling to ~$1.5T by 2030, providing the demand-side framework for understanding why token prices cannot fall indefinitely even with hardware efficiency gains.
v22 Can AI Companies Become Profitable? Epoch AI 2025 Epoch AI's analysis of multiple frontier labs finds compute (R&D plus inference) comprises 54–62% of costs and that spending is currently 2–3× revenue at each lab, with OpenAI alone spending ~$4B serving free users in 2025 — the most rigorous published quantification of frontier-lab subsidisation.
v23 LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks Epoch AI 2025 Tracks state-of-the-art model prices across six benchmarks from 2022–2025, finding task-specific price-performance declines ranging from 9× to 900× per year, with the fastest declines post-January 2024 following DeepSeek and open-weight model competition.
v24 Inference Economics of Language Models Epoch AI 2025 Deep-dives into the unit economics of LLM inference — compute, memory bandwidth, batching efficiency, and hardware utilisation — establishing that electricity is only 10–15% of GPU TCO while capital costs dominate, which sets a structural cost floor on token prices.
v25 How Persistent Is the Inference Cost Burden? Epoch AI 2025 Examines whether inference cost burdens at frontier labs are structural or transient, finding that rising query complexity (reasoning chains, agentic loops) offsets hardware efficiency gains — directly addressing whether price-per-token declines will continue through 2028.

Blogs & Independent Thinkers

ID Title Outlet Date Significance
b1 The Unsustainable Economics of LLM APIs: Understanding the Coming Price Realignment ScaleDown (tinyml.substack.com, Substack) 2024 Bottom-up hardware cost analysis concluding that LLM API providers absorb over 90% of true token costs, framing the current market as a VC-funded 'land-grab phase' structurally analogous to Uber's early subsidised pricing.
b2 The Cost of Inference: Running the Models ScaleDown (tinyml.substack.com, Substack) 2024 Practitioner-level breakdown of GPU, energy, networking, cooling, and ops overhead that compose the true cost of serving a token, providing the most granular independent infrastructure accounting framework available publicly.
b3 Tokenomics 101: Navigating the Nuances of LLM Product Pricing ScaleDown (tinyml.substack.com, Substack) 2024 Explains why input/output token price ratios reflect compute and memory bandwidth constraints rather than usage patterns, and quantifies how published API rates relate to underlying unit economics.
b4 The Economics of Building ML Products in the LLM Era ScaleDown (tinyml.substack.com, Substack) 2024 Examines the total cost of ownership for product builders layering on top of frontier APIs, showing how token costs compound through retrieval, context, and agentic chains to produce effective per-query costs far above headline rates.
b5 The Price of Tokenmaxxing Aspiring for Intelligence (Substack) 2025 Analyses how Anthropic's API pricing at scale challenges the foundational startup-layer assumption that foundation model costs would remain negligible, arguing the 'cheap token' era is ending for heavy agentic workloads.
b6 The Price Is Wrong Aspiring for Intelligence (Substack) 2025 Investigates the structural gap between flat-subscription pricing and per-token API rates, arguing Anthropic was cross-subsidising heavy agentic users by more than 5x, a dynamic now forcing explicit pricing architecture decisions.
b7 Groq Inference Tokenomics: Speed, But At What Cost? SemiAnalysis (newsletter.semianalysis.com, Substack) 2024-02 First-principles cost modelling of Groq's LPU architecture against H100 economics, establishing the benchmark methodology for comparing true cost-per-token across inference hardware generations.
b8 Inference Race To The Bottom - Make It Up On Volume? SemiAnalysis (newsletter.semianalysis.com, Substack) 2024 Directly addresses whether commodity token prices can persist below true cost at scale, arguing aggressive price competition is structurally unsustainable without volume offsets that current demand does not yet guarantee.
b9 The Inference Cost Of Search Disruption – Large Language Model Cost Analysis SemiAnalysis (newsletter.semianalysis.com, Substack) 2023 Landmark early analysis estimating what deploying GPT-4-class inference at Google Search scale would cost, establishing a cost-floor analysis that anchored subsequent independent discussion of the scale of lab subsidisation.
b10 DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts SemiAnalysis (newsletter.semianalysis.com, Substack) 2025-01 Forensic reconstruction of DeepSeek's true training compute costs and the implications for Western lab margins, directly testing how much of the apparent cost advantage is real versus accounting artefact.
b11 AMD vs NVIDIA Inference Benchmark: Who Wins? - Performance & Cost Per Million Tokens SemiAnalysis (newsletter.semianalysis.com, Substack) 2025-05 Six-month empirical benchmark comparing hardware cost-per-token across real workloads, revealing a 15x cost reduction from Hopper to Blackwell generation and nuanced workload-specific GPU advantage patterns.
b12 InferenceMAX™: Open Source Inference Benchmarking SemiAnalysis (newsletter.semianalysis.com, Substack) 2025-10 Introduces an independent TCO-per-million-token benchmark — the first to measure total cost of compute across diverse model sizes and real-world scenarios — establishing a replicable methodology for ongoing cost trajectory analysis.
b13 Mythos, Muse, and the Opportunity Cost of Compute Stratechery (Ben Thompson) 2026 Argues that AI has re-introduced meaningful marginal costs into tech after two decades of near-zero marginal cost software, with direct implications for why token prices have a structural floor and why current pricing is strategically rather than economically motivated.
b14 AI Promise and Chip Precariousness Stratechery (Ben Thompson) 2025 Examines how DeepSeek and open-weight models create persistent structural pricing pressure, arguing that sustainable margins require either a hardware cost advantage (Google's TPU edge) or aggregation, not just capability differentiation.
b15 Rapidus, The End of Economic Rationality, AI Disruption Stratechery (Ben Thompson) 2024 Argues that AI capex commitments have entered a regime where strategic imperatives suspend economic rationality, contextualising why labs sustain large operating losses to hold developer market position.
b16 Observations About LLM Inference Pricing LessWrong 2024 Empirical analysis showing 10x price dispersion for identical open-weight models across providers, inferring that software stack optimisation (batching, kernel efficiency, speculative decoding) drives more of the actual cost variance than hardware alone.
b17 Simon Willison on llm-pricing (tag archive) Simon Willison's Weblog 2023 Running empirical record of every major LLM pricing event from GPT-4's launch through 2026, with practitioner cost benchmarks (e.g. captioning 68,000 images for $1.68 with Gemini Flash) documenting the ~150x price drop with concrete real-world examples.
b18 Welcome to LLMflation — LLM inference cost is going down fast Andreessen Horowitz (a16z) 2024-11 Coins 'LLMflation' and quantifies a 10x annual cost decline for equivalent-performance inference over three years — from $60/M tokens in 2021 to $0.06/M by late 2024 — the most-cited single data point in independent discourse on the price-collapse rate.
b19 How persistent is the inference cost burden? Epoch AI (Substack) 2025 Analyses whether inference costs as a share of lab revenues are structural or transitional, estimating OpenAI's 2024 inference compute spend and modelling future cost burden under different algorithmic efficiency trajectories.
b20 How much does it cost to train frontier AI models? Epoch AI 2024 Quantifies that frontier model training costs are growing 2–3x per year and projects the largest runs crossing $1 billion by 2027, directly addressing how training capex amortises into per-token inference pricing and why apparent API prices understate true costs.
b21 LLM inference prices have fallen rapidly but unequally across tasks Epoch AI 2025 Demonstrates that inference price decline rates range from 9x to 900x per year depending on capability tier, with frontier reasoning models holding price stable while commodity models collapsed — the key bifurcation story of 2024–2025.
b22 The Jevons Paradox in AI Infrastructure: DeepSeek Efficiency Breakthroughs to Drive Energy Demand AI Proem (Substack) 2025 Applies Jevons Paradox to argue that DeepSeek-style efficiency gains will expand total AI compute demand and energy consumption rather than reduce them, establishing a rising infrastructure cost floor that will eventually pressure token prices upward.
b23 The Jevons Paradox in AI: Why Efficiency Creates More Demand The Substrate (Substack) 2025 Documents that per-token prices fell a thousandfold in three years yet total enterprise AI spending surged 320% in 2025, with enterprise inference spend reaching $37B — empirically confirming that Jevons effects dominate price reductions at the market level.
b24 AI agents are about to get more expensive Tiny Empires (Substack) 2025 Argues that multi-step agentic workflows break the 'cheap token' assumption by multiplying token consumption multiplicatively, making true total cost of ownership for agentic AI materially higher than per-token sticker prices suggest.
b25 AI Pricing Architecture Is Now Strategy SaaS Intelligence (Substack) 2025 Frames token-based API pricing as a strategic weapon for developer lock-in and market share capture, drawing direct parallels to early AWS subsidised cloud pricing as a land-grab before margin normalisation.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.