Explainer Collection

Token Cost Of Ownership

Xing (2026)

AI inference tokens are becoming a commodity, and someone has designed the futures contract

This paper argues that the tokens consumed by large language models share the economic properties of electricity and carbon credits, then proposes a complete standardized futures contract to let enterprises hedge their compute costs.

62% - 78%: reduction in enterprise compute cost volatility from token futures hedging across all simulated scenarios
40×: decline in GPT-4-level inference prices from early 2023 to early 2025 (from $60 to under $1.50 per million output tokens)

Chen (2025)

Nightly GPU benchmarks reveal no single vendor wins everywhere, but cost per token tells the real story

SemiAnalysis's InferenceMAX is an open-source, nightly benchmark that tracks throughput, latency, TCO per million tokens, and tokens per megawatt across NVIDIA and AMD GPUs, exposing how fast inference software improves and where each chip actually leads.

~3×: generational power efficiency gain: CDNA3 to CDNA4 and H100 to B200 on GPT-OSS 120B reasoning workloads
4×: better TCO per million tokens on GB200 NVL72 vs single-node servers for DeepSeek 670B FP8 at 35 tok/s/user