AI inference tokens are becoming a commodity, and someone has designed the futures contract

This paper argues that the tokens consumed by large language models share the economic properties of electricity and carbon credits, then proposes a complete standardized futures contract to let enterprises hedge their compute costs.

Published March 2026

62%–78% reduction in enterprise compute cost volatility from token futures hedging across all simulated scenarios

40× decline in GPT-4-level inference prices from early 2023 to early 2025 (from $60 to under $1.50 per million output tokens)

>$10B annual global AI inference API market as of 2024, growing at over 100% per year, meeting the scale threshold for a commodity market

87% variance reduction in 12-month procurement costs under baseline-scenario optimal hedging (hedge ratio h* = 0.85)

Simulated token price paths over 36 months (mean and confidence intervals)

Reproduced from Xing (2026), Figure 6. 10,000-path Monte Carlo simulation of Standard Inference Token (SIT) prices under a mean-reverting jump-diffusion model. The U-shaped mean path captures the transition from supply-driven decline to demand-driven volatility.

Tokens as a commodity

The central provocation of this paper is simple: the AI inference token is not a service, it is a raw material. Xing argues that tokens satisfy the classical economic criteria for a tradeable commodity. They are fungible (you don't care which GPU made the token, only that the output meets a quality bar). They are standardised (the industry already quotes in "million tokens"). And the market is large enough, with global AI inference API revenue exceeding $10 billion in 2024 and growing at more than 100% annually.

The closest analogue is electricity. Both are non-storable, both have rigid short-term supply, and both exhibit time-varying demand. Electricity futures have traded successfully since Nord Pool launched in 1993, proving that non-storability is not a barrier to derivatives markets. Carbon emission allowances offer another parallel: an entirely artificial commodity that went from concept to mature futures market within a decade.

Tokens also have a distinctive dual nature. Right now, most people think of them as a finished product, the output of a chatbot. But as AI embeds into manufacturing, logistics, autonomous driving, and healthcare, tokens become an intermediate input, a raw material consumed inside larger production processes. Xing draws an explicit parallel to electricity's historical arc: novelty product in the 1890s, infrastructure commodity by the 1950s.

GPT-4-level inference token price trajectory and projected scenarios (2023–2030)

Reproduced from Xing (2026), Figure 3. Historical prices on a log scale show a 40× decline from 2023 to 2025. Two diverging projections illustrate continued decline vs. a demand-explosion reversal driven by VLA and embodied AI adoption.

Token demand structure by tier (current share of total demand)

Based on Xing (2026), Section 3.2. The four tiers differ sharply in price elasticity. As Tiers 3 and 4 grow, overall market demand becomes more inelastic, amplifying future price volatility.

Why current cheap prices won't last

GPT-4-level inference costs dropped over 40-fold in two years, from roughly $60 per million output tokens in early 2023 to under $1.50 in early 2025. Three factors drove this collapse simultaneously: architecture improvements like Mixture-of-Experts, hardware upgrades from A100 to H100 to B200, and aggressive price wars among providers subsidising below marginal cost to grab market share.

Xing proposes a formal three-factor supply model: token supply equals the product of hardware efficiency, algorithm efficiency, and capital investment, divided by energy cost. When all three factors improve at once, the combined effect is multiplicative. That is the structural explanation for the price freefall.

But the paper's key prediction is that this won't continue. Token demand is shifting from elastic developer experimentation toward inelastic enterprise SaaS and VLA/embodied AI workloads. When a fleet of autonomous vehicles needs continuous real-time inference, price sensitivity drops close to zero. Meanwhile, supply expansion is physically slow: new data centres take 18 to 36 months, TSMC wafer capacity takes around 24 months to expand, and power infrastructure takes years. This asymmetry between instant demand spikes and sluggish supply response is the same dynamic that produces electricity price spikes. Xing projects that post-2027, token prices will exhibit volatile peak-valley patterns rather than continuing to fall.

Designing the futures contract

The paper's most concrete contribution is a complete contract specification for Standard Inference Token (SIT) futures. The SIT is defined as one inference token from a model meeting specific benchmark thresholds (MMLU ≥ 86%, HumanEval ≥ 67%, GSM8K ≥ 92%), anchored to GPT-4-Turbo's January 2024 performance. Think of it like API gravity standards for crude oil: different models can deliver tokens of varying quality, but SIT gives you a single tradeable grade.

Each contract covers one million SIT, quoted in dollars per million. Settlement is cash-based against a Token Price Index (TPI), a volume-weighted average across qualifying providers, with a 30% single-provider weight cap to prevent any one company from dictating the price. The margin system uses dynamic initial margins of 8% to 12% of contract value, adjusted for recent volatility. Circuit breakers trigger at ±15% and ±25% price moves.

Xing also specifies a market-maker regime requiring $50 million minimum net capital, continuous two-sided quotes for at least 80% of trading hours, and bid-ask spreads within 2% for front-month contracts. The overall framework borrows heavily from electricity futures design, particularly Nord Pool and PJM, adapted for the specific properties of compute.

The Monte Carlo evidence for hedging

To test whether these futures actually help, Xing builds a mean-reverting jump-diffusion model for token prices and runs 10,000 Monte Carlo paths over a three-year horizon (2026 to 2028). The model features a negative trend coefficient (reflecting ongoing technology-driven price decline), upward-biased jumps (representing demand shocks), seasonal variation, and a mean-reversion speed implying a roughly 2.8-month half-life.

The results tell a clear story. Under the baseline scenario, optimal-ratio hedging at h* = 0.85 cuts the 12-month procurement cost standard deviation from $1.80 to $0.65 per million SIT, an 87% variance reduction. In the optimistic demand-explosion scenario, hedging efficiency rises to 91%. Even in the pessimistic case (faster-than-expected technology progress making locked-in futures prices look expensive), the efficiency is still 78%. Across all scenarios, futures reduce compute cost volatility by 62% to 78% measured by standard deviation.

The simulation also reveals a striking asymmetry: about 15% of paths experience at least one price doubling within 36 months, while 3% see peak-to-trough swings exceeding 5×. Implied volatility peaks at the 6-to-12-month horizon (50% to 60%), reflecting the concentration of uncertainty around when the application-layer explosion might hit. This is precisely the risk profile that makes a hedging instrument valuable.

What's missing and what comes next

The paper is honest about timing. Of Black's five conditions for a successful futures market, one is not yet met: sufficient two-directional price volatility. Token prices have only gone down so far. Xing estimates the optimal launch window at 2027 to 2028, once VLA and embodied AI demand begins producing the supply-demand friction that makes hedging worthwhile. The 2025 to 2026 period is framed as preparation time for building the Token Price Index infrastructure, refining contract design, and securing regulatory approval (likely CFTC, since token futures are commodity-based, not securities).

GPU futures get a separate analysis and come out less favourably. Physical GPU futures suffer from rapid hardware obsolescence (18-to-24-month cycles), multi-dimensional performance that resists standardisation, and extreme supply concentration in NVIDIA. Xing proposes a workaround: futures on GPU compute time rather than physical chips, using a Standard Compute Unit benchmarked to the H100. Token futures and GPU compute futures would then form an upstream-downstream pair, with the spread between them capturing the algorithm efficiency premium.

The strongest part of the paper is the commodity analogy framework and contract design specificity. The weakest is the simulation, which necessarily calibrates a model on hypothetical parameters since no real token spot market data with the expected volatility structure exists yet. If the demand explosion never materialises, or materialises at a different timescale, the hedging value proposition changes substantially. Still, as a blueprint for treating compute like a commodity, the paper sets out the fullest version of the argument yet published.

BOTTOM LINE

Xing makes a detailed, structurally sound case that AI inference tokens are on the path from service output to infrastructure commodity, much as electricity was a century ago. The proposed SIT futures contract, with its benchmark-anchored quality grade, volume-weighted price index, and electricity-inspired margin and circuit-breaker design, is the most complete specification yet for a compute derivatives market. The Monte Carlo simulations show 62% to 78% cost volatility reduction for enterprises. The main caveat is timing: the market needs two-directional price volatility to justify itself, and that depends on an application-layer demand explosion that hasn't happened yet.

Reference

Xing, Y. (2026). AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design. arXiv preprint arXiv:2603.21690v1. https://arxiv.org/abs/2603.21690