Token Cost of Ownership

AI token pricing vs true total cost of ownership from January 2023 to 19 April 2026, with emphasis on 2025–2026 signals: lab subsidisation strategies, infrastructure economics (compute, energy, data centres, hardware, security, ops), how user-facing prices have evolved, and analyst and researcher projections for token cost trajectories through 2028.

financial
frontier
academic
vc
blogs

Synthesised 2026-04-19

Narrative

The central story from frontier lab coverage is a dramatic and continuing collapse in published API token prices — GPT-4 launched in March 2023 at roughly $30–$36 per million input tokens; by mid-2024 GPT-4o was available at ~$5/M, and by early 2026 flagship models cluster around $1.75–$5/M input — a 85–95% reduction in under three years. Anthropic cut Claude Opus pricing by 67% in a single announcement in 2025, while Google's Trillium (TPU v6e) infrastructure delivered a 4× improvement in inference performance-per-dollar. Yet financial disclosures and investigative reporting (The Register, Data Center Dynamics, Ed Zitron's newsletter) reveal that these user-facing prices are far below actual cost: OpenAI spent an estimated $8.4B on inference in 2025 alone against roughly $20B in revenue, and its cumulative losses through 2024 reached $5B. The 25× gap between flat subscription pricing and actual API cost for heavy users, documented by Centific, epitomises the subsidisation dynamic. METR's pre-deployment evaluations — covering GPT-4o (2024), GPT-4.5 (Feb 2025), Claude 3.7 (April 2025), GPT-5 (May 2025), and a January 2026 time-horizon update — add a critical safety and capability dimension: autonomous task completion horizons are doubling roughly every 7 months, meaning that as tokens become cheaper, the agentic compute per task is simultaneously expanding, driving total inference spending upward even as per-token unit costs fall.

On the infrastructure side, SemiAnalysis's InferenceMAX benchmarking (2025) and NVIDIA's Blackwell architecture results show hardware efficiency gains of up to 15× versus the prior H100 generation — the primary mechanism behind price declines on the supply side. Google's TPUv7 analysis underscores a structural cost-advantage dynamic: labs with proprietary silicon (Google with TPUs, potentially Anthropic with long-term AWS partnerships) face different cost floors than those dependent on GPU spot markets. However, two arXiv preprints from March 2026 formalise why these efficiency gains will not produce sustained price floors: the Structural Jevons Paradox — as token costs drop, enterprise architectures are redesigned to consume exponentially more compute via multi-agent loops, extended context, and deeper reasoning chains — and a token-futures paper modelling how lab subsidisation strategies will eventually give way to commodity market pricing. The consensus across these sources is that hyperscaler capex for 2026 (~$602B, roughly 75% AI-tied) and rising energy demand are price-floor candidates that will increasingly resist downward pressure, even as hardware efficiency continues to improve.

Sources

ID	Title	Outlet	Date	Significance
t1	METR's GPT-4.5 Pre-Deployment Evaluations	METR (Model Evaluation & Threat Research)	2025-02	Official METR pre-deployment autonomy evaluation of GPT-4.5, finding capabilities between GPT-4o and o1 and assessing risk level relative to existing frontier models.
t2	Details about METR's Preliminary Evaluation of Claude 3.7	METR (Model Evaluation & Threat Research)	2025-04	Pre-deployment autonomy assessment of Claude 3.7 Sonnet, noting impressive AI R&D capabilities on RE-Bench but no evidence of dangerous-level autonomous capabilities.
t3	Details about METR's Evaluation of OpenAI GPT-5	METR (Model Evaluation & Threat Research)	2025-05	METR's autonomy evaluation of OpenAI's flagship GPT-5 model, providing the most current public capability benchmarking for the frontier's leading model.
t4	Details about METR's Preliminary Evaluation of GPT-4o	METR (Model Evaluation & Threat Research)	2024-05	Baseline METR autonomy evaluation for GPT-4o, establishing a reference point against which later models' capability escalations are measured.
t5	Task-Completion Time Horizons of Frontier AI Models — Time Horizon 1.1	METR (Model Evaluation & Threat Research)	2026-01	METR's updated time-horizon dataset showing frontier model autonomous task-completion window doubling roughly every 7 months since 2019, with an expanded task suite giving tighter estimates at longer horizons.
t6	Measuring AI Ability to Complete Long Tasks	METR (Model Evaluation & Threat Research)	2025-03	Introduces METR's methodology for quantifying how long AI agents can sustain productive autonomous work, directly informing the inference-cost implications of extended agentic deployments.
t7	Details about METR's Preliminary Evaluation of DeepSeek and Qwen Models	METR (Model Evaluation & Threat Research)	2025-07	Finds mid-2025 DeepSeek autonomous capability levels comparable to late-2024 frontier models, highlighting how cost-efficient open-weight models are closing the autonomy gap.
t8	Introducing Claude 3.5 Sonnet	Anthropic	2024-06	Official launch announcement establishing Claude 3.5 Sonnet as Anthropic's price-performance flagship, priced at $3/$15 per million tokens — significantly undercutting Claude 3 Opus at $15/$75.
t9	Model Card Addendum: Claude 3.5 Haiku and Upgraded Claude 3.5 Sonnet	Anthropic	2024-10	Official Anthropic model card documenting safety evaluations, capability benchmarks, and technical specifications for the October 2024 Claude 3.5 refresh — a primary technical disclosure.
t10	Google and Anthropic Drop AI Prices and Release New Models	PYMNTS	2025-05	Documents the coordinated 2025 pricing cuts by Google (Gemini) and Anthropic (Claude Opus 4.5, price cut by 67%), illustrating competitive subsidisation dynamics between frontier labs.
t11	OpenAI Has Spent $12B on Inference with Microsoft: Report	The Register	2025-11	Reports OpenAI's cumulative inference spend of $12B on Azure, exposing the massive infrastructure subsidy underpinning user-facing token prices.
t12	OpenAI Training and Inference Costs Could Reach $7bn for 2024, AI Startup Set to Lose $5bn	Data Center Dynamics	2024-09	Key financial disclosure showing OpenAI's 2024 compute cost structure — $7B in training and inference against $3.7B revenue — quantifying the scale of below-cost token pricing.
t13	Exclusive: Here's How Much OpenAI Spends on Inference and Its Revenue Share With Microsoft	Where's Your Ed At (Ed Zitron)	2025-05	Detailed breakdown of OpenAI's leaked internal financials, showing inference costs at $8.4B in 2025 — 66% from paying users — with projections rising to $14.1B in 2026.
t14	OpenAI Faces Financial Growing Pains, Spending Double Its Revenue	DeepLearning.AI – The Batch	2024-10	Concise summary of OpenAI's loss trajectory ($540M in 2022 → $5B in 2024), contextualising why user-facing token prices remain far below true cost.
t15	The Rising Costs of Training Frontier AI Models	arXiv (preprint)	2024-05	Academic analysis quantifying the exponential escalation in frontier model training costs, providing the cost-amortisation context for why labs price tokens below marginal cost.
t16	AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design	arXiv (preprint)	2026-03	Proposes a formal framework for AI token pricing as a tradeable commodity, analysing the structural forces — lab subsidisation, demand elasticity, and market power — driving current API pricing.
t17	Photons = Tokens: The Physics of AI and the Economics of Knowledge	arXiv (preprint)	2026-03	Formalises the Structural Jevons Paradox in AI: as unit token costs fall, firms redesign agent architectures to consume dramatically more compute via deeper reasoning loops and larger context windows.
t18	InferenceMAX™: Open Source Inference Benchmarking	SemiAnalysis	2025-06	SemiAnalysis's open-source benchmark showing NVIDIA Blackwell delivering 15× lower cost per million tokens versus prior generation, setting the hardware efficiency baseline for 2025–2026 pricing floors.
t19	AI Datacenter Energy Dilemma — Race for AI Datacenter Space	SemiAnalysis	2024-12	Detailed infrastructure analysis from SemiAnalysis on power constraints, data-centre construction timelines, and energy costs as the principal rising-cost vector offsetting hardware efficiency gains.
t20	Google TPUv7: The 900lb Gorilla In the Room	SemiAnalysis	2025-08	Deep technical analysis of Google's latest proprietary TPU, showing how vertical compute integration gives Google a structural cost advantage in Gemini inference pricing versus GPU-dependent rivals.
t21	Introducing Cloud TPU v5p and AI Hypercomputer	Google Cloud (official)	2023-12	Google's official announcement of TPU v5p infrastructure powering Gemini training, establishing the proprietary compute stack that underpins Google's inference cost economics.
t22	Trillium TPU Is GA	Google Cloud (official)	2024-11	Announces general availability of Trillium (TPU v6e), offering 4× better performance-per-dollar for inference versus v5e and used to train Gemini 2.0 — quantifying Google's hardware efficiency edge.
t23	NVIDIA Blackwell Raises Bar in New InferenceMAX Benchmarks, Delivering Unmatched Performance and Lowest Cost Per Token	NVIDIA (official)	2025-07	Official NVIDIA benchmark results showing Blackwell architecture's cost-per-token leadership, directly informing the hardware cost floor for frontier labs running GPU-based inference.
t24	Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters	NVIDIA (official)	2025-03	NVIDIA's TCO framework for 'AI factories,' arguing that total cost of ownership — not GPU price — governs real inference economics, encompassing compute, networking, cooling, and utilisation.
t25	The 25× Subscription Trap: Why Frontier Labs Can No Longer Subsidize Your AI	Centific	2025-09	Documents the 25× gap between flat subscription fees and actual API cost for heavy users, providing concrete evidence of the scale of cross-subsidisation in frontier lab pricing models.