Research · VC & Analyst Reports
Back to sweepResearch sweep · deep · 2023 – 2026
Token Cost of Ownership
AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.
- financial
- frontier
- academic
- vc
- blogs
Synthesised 2026-04-19
Narrative
The central story from VC and analyst coverage between 2023 and April 2026 is a deepening tension between rapidly falling per-token prices and a structurally widening total cost-of-ownership gap. Sequoia Capital's David Cahn crystallised this in 'AI's $600B Question' (June 2024): the revenue shortfall implied by AI infrastructure investment had grown from ~$125B (September 2023) to ~$600B in a year, with GPUs representing only half of data-centre TCO once energy, construction, cooling, and ops overhead are counted. McKinsey's 'Cost of Compute' report (2025) extended this globally, projecting $3.7–$7.9 trillion in data-centre capex through 2030 (base case $5.2T) and introducing the 'AI factory' metaphor — 'without tokens, there is no revenue' — that has since become standard consulting vocabulary. Bain's 2025 Technology Report raised the ante further, calculating that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand, alongside an $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud. On the pricing side, Epoch AI's empirical data-insight series documents per-task inference prices falling 9× to 900× per year (2024–2025), while ARK Invest — using SemiAnalysis benchmark data — pegs inference cost declines at ~95% annually, outpacing even the 75%/year decline in training costs. The a16z/OpenRouter 'State of AI' joint study (January 2026), drawing on 100 trillion real tokens, confirmed that agentic inference is the fastest-growing workload, with average coding prompts exceeding 20,000 tokens — illustrating Jevons paradox in action: cheaper tokens drive total compute demand upward faster than the unit-cost decline.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| v1 | State of AI: An Empirical 100 Trillion Token Study with OpenRouter | Andreessen Horowitz (a16z) | 2026-01 | Empirical study of 100T tokens routed via OpenRouter reveals that agentic inference is the fastest-growing use pattern and that developers overwhelmingly optimise for quality over price, with Claude holding ~60% of coding workloads at 20K+ token average prompts — directly illustrating Jevons paradox at the token level. |
| v2 | AI Is Driving A Shift Towards Outcome-Based Pricing (December 2024 Enterprise Newsletter) | Andreessen Horowitz (a16z) | 2024-12 | Argues that per-token pricing is giving way to outcome-based pricing as AI costs scale, but finds CIOs remain uncomfortable with outcome metrics — a key signal that token-cost opacity is migrating into enterprise contract design. |
| v3 | How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 | Andreessen Horowitz (a16z) | 2025 | Survey of 100 enterprise CIOs finds 80% missed AI infrastructure cost forecasts by more than 25% and 84% report margin erosion tied to AI workloads, establishing that user-facing token prices grossly understate true enterprise TCO. |
| v4 | CFO Roundtable: AI Growth, Pricing, and Forecasting (June 2025 Fintech Newsletter) | Andreessen Horowitz (a16z) | 2025-06 | CFO-level discussion on AI unit economics reveals that every token processed is a direct variable cost, and that the newest reasoning models still command relatively high costs despite commodity-model price compression. |
| v5 | AI's $600B Question | Sequoia Capital | 2024-06 | David Cahn's landmark framework quantifies the annual revenue gap between AI infrastructure investment and actual AI-ecosystem revenue at ~$600B, calculates GPU costs as exactly half of AI data-centre TCO, and explicitly flags rapid GPU depreciation as a structural risk to lab economics. |
| v6 | AI is Now Shovel Ready | Sequoia Capital | 2024-12 | Designates 2025 as the 'Year of the Data Center,' detailing that average AI data-centre construction takes ~2 years, that Amazon committed $50B+ to new builds in H1 2024, and that capital allocation risk from long lead times is a primary structural constraint on AI supply economics. |
| v7 | AI in 2025: Building Blocks Firmly in Place | Sequoia Capital | 2025-01 | Annual outlook positions 2025 as an execution year where infrastructure build-out transitions from deal-signing to physical deployment, with cloud service providers competing on GPU cluster scale and pricing as the primary near-term battleground. |
| v8 | AI in 2026: A Tale of Two AIs | Sequoia Capital | 2026-01 | Identifies a bifurcation between commoditised inference and frontier reasoning models, framing the divergence as a structural price-floor dynamic where frontier capability commands premium pricing while commodity models race toward near-zero marginal cost. |
| v9 | The Cost of Compute: A $7 Trillion Race to Scale Data Centers | McKinsey Global Institute | 2025 | Projects $3.7–$7.9 trillion in global data-centre capex through 2030 across three demand scenarios, with the base case at $5.2 trillion, and allocates ~60% of spend to computing hardware, ~25% to power and cooling, and ~15% to construction — the most comprehensive public cost-stack decomposition available. |
| v10 | Who's Funding the AI Data Center Boom? | McKinsey Global Institute | 2025 | Examines the financing structure behind AI data-centre buildout, clarifying that hyperscaler balance sheets, sovereign wealth funds, and private credit are the three capital pools underwriting infrastructure that token prices must eventually recoup. |
| v11 | Issue Brief: AI Infrastructure | McKinsey Global Institute | 2025 | Frames AI infrastructure as an 'AI factory' model — data and electricity as inputs, tokens and insights as outputs — directly linking compute capex to revenue generation and articulating the economic logic that will ultimately drive token price normalisation. |
| v12 | Beyond Compute: Infrastructure That Powers and Cools AI Data Centers | McKinsey Global Institute | 2025 | Analyses the non-compute TCO components (power, cooling, backup generation, physical plant) that are often invisible in quoted token prices, projecting 200 incremental GW of AI-related capacity required in the accelerated scenario and flagging energy as a rising, not falling, cost component. |
| v13 | Token Economics, Physical AI, and Beyond: McKinsey Previews NVIDIA GTC | McKinsey Global Institute | 2025-03 | McKinsey's Chris Smith explicitly adopts 'token economics' as an analytical unit, signalling that major strategy consultancies have shifted from cloud-hour pricing to per-token unit economics as the primary framework for AI infrastructure ROI analysis. |
| v14 | Technology Report 2025: $2 Trillion in New Revenue Needed to Fund AI's Scaling Trend | Bain & Company | 2025 | Bain's headline finding that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand is the single most cited cost-recovery gap figure in 2025 analyst literature, and directly implies sustained lab subsidisation until that gap closes. |
| v15 | How Can We Meet AI's Insatiable Demand for Compute Power? | Bain & Company | 2025 | Quantifies that AI compute demand is growing at more than twice the rate of Moore's Law and projects a global $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud and AI data centres. |
| v16 | AI's Trillion-Dollar Opportunity (Global Technology Report 2024) | Bain & Company | 2024 | Bain's 2024 baseline report that documents unprecedented GenAI adoption speed despite cost roadblocks, establishing the trajectory against which the 2025 $2T gap estimate is benchmarked. |
| v17 | Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026 | Gartner | 2026-01 | Gartner's official forecast of $2.52 trillion in worldwide AI spending for 2026 — a 44% YoY increase — provides the most widely cited market-size anchor for contextualising token-price economics against total infrastructure outlays. |
| v18 | Strategic Predictions for 2026: How AI's Underestimated Influence Is Reshaping Business | Gartner | 2025-10 | Gartner's top strategic predictions for 2026 and beyond cover AI agent proliferation, agentic spending intermediation, and enterprise cost displacement — providing the Technology Radar framing for how token-cost trajectory intersects with enterprise software budgets. |
| v19 | Gartner Survey Finds 54% of Infrastructure & Operations Leaders Are Adopting AI to Cut Costs | Gartner | 2025-10 | Survey evidence that more than half of I&O leaders view AI primarily as a cost-reduction tool, creating a circular dynamic where AI's cost is justified by AI's cost savings — a framing that shapes enterprise willingness to absorb rising token bills. |
| v20 | The State of AI Infrastructure: Demand, Costs, and Custom Silicon | ARK Investment Management | 2025-12 | Using SemiAnalysis InferenceMax benchmarks, ARK calculates that inference costs for capable models are falling at ~95% annually, outpacing the ~75% annual training cost decline, and identifies custom silicon (Trainium, TPU, MTIA) as the next structural cost lever hyperscalers are deploying to reduce Nvidia dependence. |
| v21 | AI Will Determine the Future of Software and Cloud Spending | ARK Investment Management | 2025 | ARK projects global data-centre systems investment growing at 30%+ annually to reach $653B in 2026, with AI infrastructure spend tripling to ~$1.5T by 2030, providing the demand-side framework for understanding why token prices cannot fall indefinitely even with hardware efficiency gains. |
| v22 | Can AI Companies Become Profitable? | Epoch AI | 2025 | Epoch AI's analysis of multiple frontier labs finds compute (R&D plus inference) comprises 54–62% of costs and that spending is currently 2–3× revenue at each lab, with OpenAI alone spending ~$4B serving free users in 2025 — the most rigorous published quantification of frontier-lab subsidisation. |
| v23 | LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks | Epoch AI | 2025 | Tracks state-of-the-art model prices across six benchmarks from 2022–2025, finding task-specific price-performance declines ranging from 9× to 900× per year, with the fastest declines post-January 2024 following DeepSeek and open-weight model competition. |
| v24 | Inference Economics of Language Models | Epoch AI | 2025 | Deep-dives into the unit economics of LLM inference — compute, memory bandwidth, batching efficiency, and hardware utilisation — establishing that electricity is only 10–15% of GPU TCO while capital costs dominate, which sets a structural cost floor on token prices. |
| v25 | How Persistent Is the Inference Cost Burden? | Epoch AI | 2025 | Examines whether inference cost burdens at frontier labs are structural or transient, finding that rising query complexity (reasoning chains, agentic loops) offsets hardware efficiency gains — directly addressing whether price-per-token declines will continue through 2028. |