Research · VC & Analyst Reports

Back to sweep

Research sweep · deep · 2023 – 2026

Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

  • financial
  • frontier
  • academic
  • vc
  • blogs

Synthesised 2026-04-19

Narrative

The central story from VC and analyst coverage between 2023 and April 2026 is a deepening tension between rapidly falling per-token prices and a structurally widening total cost-of-ownership gap. Sequoia Capital's David Cahn crystallised this in 'AI's $600B Question' (June 2024): the revenue shortfall implied by AI infrastructure investment had grown from ~$125B (September 2023) to ~$600B in a year, with GPUs representing only half of data-centre TCO once energy, construction, cooling, and ops overhead are counted. McKinsey's 'Cost of Compute' report (2025) extended this globally, projecting $3.7–$7.9 trillion in data-centre capex through 2030 (base case $5.2T) and introducing the 'AI factory' metaphor — 'without tokens, there is no revenue' — that has since become standard consulting vocabulary. Bain's 2025 Technology Report raised the ante further, calculating that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand, alongside an $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud. On the pricing side, Epoch AI's empirical data-insight series documents per-task inference prices falling 9× to 900× per year (2024–2025), while ARK Invest — using SemiAnalysis benchmark data — pegs inference cost declines at ~95% annually, outpacing even the 75%/year decline in training costs. The a16z/OpenRouter 'State of AI' joint study (January 2026), drawing on 100 trillion real tokens, confirmed that agentic inference is the fastest-growing workload, with average coding prompts exceeding 20,000 tokens — illustrating Jevons paradox in action: cheaper tokens drive total compute demand upward faster than the unit-cost decline.


Sources

ID Title Outlet Date Significance
v1 State of AI: An Empirical 100 Trillion Token Study with OpenRouter Andreessen Horowitz (a16z) 2026-01 Empirical study of 100T tokens routed via OpenRouter reveals that agentic inference is the fastest-growing use pattern and that developers overwhelmingly optimise for quality over price, with Claude holding ~60% of coding workloads at 20K+ token average prompts — directly illustrating Jevons paradox at the token level.
v2 AI Is Driving A Shift Towards Outcome-Based Pricing (December 2024 Enterprise Newsletter) Andreessen Horowitz (a16z) 2024-12 Argues that per-token pricing is giving way to outcome-based pricing as AI costs scale, but finds CIOs remain uncomfortable with outcome metrics — a key signal that token-cost opacity is migrating into enterprise contract design.
v3 How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 Andreessen Horowitz (a16z) 2025 Survey of 100 enterprise CIOs finds 80% missed AI infrastructure cost forecasts by more than 25% and 84% report margin erosion tied to AI workloads, establishing that user-facing token prices grossly understate true enterprise TCO.
v4 CFO Roundtable: AI Growth, Pricing, and Forecasting (June 2025 Fintech Newsletter) Andreessen Horowitz (a16z) 2025-06 CFO-level discussion on AI unit economics reveals that every token processed is a direct variable cost, and that the newest reasoning models still command relatively high costs despite commodity-model price compression.
v5 AI's $600B Question Sequoia Capital 2024-06 David Cahn's landmark framework quantifies the annual revenue gap between AI infrastructure investment and actual AI-ecosystem revenue at ~$600B, calculates GPU costs as exactly half of AI data-centre TCO, and explicitly flags rapid GPU depreciation as a structural risk to lab economics.
v6 AI is Now Shovel Ready Sequoia Capital 2024-12 Designates 2025 as the 'Year of the Data Center,' detailing that average AI data-centre construction takes ~2 years, that Amazon committed $50B+ to new builds in H1 2024, and that capital allocation risk from long lead times is a primary structural constraint on AI supply economics.
v7 AI in 2025: Building Blocks Firmly in Place Sequoia Capital 2025-01 Annual outlook positions 2025 as an execution year where infrastructure build-out transitions from deal-signing to physical deployment, with cloud service providers competing on GPU cluster scale and pricing as the primary near-term battleground.
v8 AI in 2026: A Tale of Two AIs Sequoia Capital 2026-01 Identifies a bifurcation between commoditised inference and frontier reasoning models, framing the divergence as a structural price-floor dynamic where frontier capability commands premium pricing while commodity models race toward near-zero marginal cost.
v9 The Cost of Compute: A $7 Trillion Race to Scale Data Centers McKinsey Global Institute 2025 Projects $3.7–$7.9 trillion in global data-centre capex through 2030 across three demand scenarios, with the base case at $5.2 trillion, and allocates ~60% of spend to computing hardware, ~25% to power and cooling, and ~15% to construction — the most comprehensive public cost-stack decomposition available.
v10 Who's Funding the AI Data Center Boom? McKinsey Global Institute 2025 Examines the financing structure behind AI data-centre buildout, clarifying that hyperscaler balance sheets, sovereign wealth funds, and private credit are the three capital pools underwriting infrastructure that token prices must eventually recoup.
v11 Issue Brief: AI Infrastructure McKinsey Global Institute 2025 Frames AI infrastructure as an 'AI factory' model — data and electricity as inputs, tokens and insights as outputs — directly linking compute capex to revenue generation and articulating the economic logic that will ultimately drive token price normalisation.
v12 Beyond Compute: Infrastructure That Powers and Cools AI Data Centers McKinsey Global Institute 2025 Analyses the non-compute TCO components (power, cooling, backup generation, physical plant) that are often invisible in quoted token prices, projecting 200 incremental GW of AI-related capacity required in the accelerated scenario and flagging energy as a rising, not falling, cost component.
v13 Token Economics, Physical AI, and Beyond: McKinsey Previews NVIDIA GTC McKinsey Global Institute 2025-03 McKinsey's Chris Smith explicitly adopts 'token economics' as an analytical unit, signalling that major strategy consultancies have shifted from cloud-hour pricing to per-token unit economics as the primary framework for AI infrastructure ROI analysis.
v14 Technology Report 2025: $2 Trillion in New Revenue Needed to Fund AI's Scaling Trend Bain & Company 2025 Bain's headline finding that $2 trillion in new annual revenue must be generated by 2030 to profitably absorb AI compute demand is the single most cited cost-recovery gap figure in 2025 analyst literature, and directly implies sustained lab subsidisation until that gap closes.
v15 How Can We Meet AI's Insatiable Demand for Compute Power? Bain & Company 2025 Quantifies that AI compute demand is growing at more than twice the rate of Moore's Law and projects a global $800B infrastructure shortfall even if all enterprise on-premise IT budgets were redirected to cloud and AI data centres.
v16 AI's Trillion-Dollar Opportunity (Global Technology Report 2024) Bain & Company 2024 Bain's 2024 baseline report that documents unprecedented GenAI adoption speed despite cost roadblocks, establishing the trajectory against which the 2025 $2T gap estimate is benchmarked.
v17 Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026 Gartner 2026-01 Gartner's official forecast of $2.52 trillion in worldwide AI spending for 2026 — a 44% YoY increase — provides the most widely cited market-size anchor for contextualising token-price economics against total infrastructure outlays.
v18 Strategic Predictions for 2026: How AI's Underestimated Influence Is Reshaping Business Gartner 2025-10 Gartner's top strategic predictions for 2026 and beyond cover AI agent proliferation, agentic spending intermediation, and enterprise cost displacement — providing the Technology Radar framing for how token-cost trajectory intersects with enterprise software budgets.
v19 Gartner Survey Finds 54% of Infrastructure & Operations Leaders Are Adopting AI to Cut Costs Gartner 2025-10 Survey evidence that more than half of I&O leaders view AI primarily as a cost-reduction tool, creating a circular dynamic where AI's cost is justified by AI's cost savings — a framing that shapes enterprise willingness to absorb rising token bills.
v20 The State of AI Infrastructure: Demand, Costs, and Custom Silicon ARK Investment Management 2025-12 Using SemiAnalysis InferenceMax benchmarks, ARK calculates that inference costs for capable models are falling at ~95% annually, outpacing the ~75% annual training cost decline, and identifies custom silicon (Trainium, TPU, MTIA) as the next structural cost lever hyperscalers are deploying to reduce Nvidia dependence.
v21 AI Will Determine the Future of Software and Cloud Spending ARK Investment Management 2025 ARK projects global data-centre systems investment growing at 30%+ annually to reach $653B in 2026, with AI infrastructure spend tripling to ~$1.5T by 2030, providing the demand-side framework for understanding why token prices cannot fall indefinitely even with hardware efficiency gains.
v22 Can AI Companies Become Profitable? Epoch AI 2025 Epoch AI's analysis of multiple frontier labs finds compute (R&D plus inference) comprises 54–62% of costs and that spending is currently 2–3× revenue at each lab, with OpenAI alone spending ~$4B serving free users in 2025 — the most rigorous published quantification of frontier-lab subsidisation.
v23 LLM Inference Prices Have Fallen Rapidly but Unequally Across Tasks Epoch AI 2025 Tracks state-of-the-art model prices across six benchmarks from 2022–2025, finding task-specific price-performance declines ranging from 9× to 900× per year, with the fastest declines post-January 2024 following DeepSeek and open-weight model competition.
v24 Inference Economics of Language Models Epoch AI 2025 Deep-dives into the unit economics of LLM inference — compute, memory bandwidth, batching efficiency, and hardware utilisation — establishing that electricity is only 10–15% of GPU TCO while capital costs dominate, which sets a structural cost floor on token prices.
v25 How Persistent Is the Inference Cost Burden? Epoch AI 2025 Examines whether inference cost burdens at frontier labs are structural or transient, finding that rising query complexity (reasoning chains, agentic loops) offsets hardware efficiency gains — directly addressing whether price-per-token declines will continue through 2028.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.