Research Explainer · Litowitz (2026)

The token economy has a finite energy budget, but the real bottleneck is knowing which questions to ask

Litowitz, Polson and Sokolov treat the AI token as a physical quantity with measurable thermodynamic cost, then build a MacKay-style balance sheet showing that projected 2028 US infrastructure could supply 225,000 tokens per person per day, over 1,000× current usage. The binding constraint, they argue, is not compute but the human capacity to formulate questions worth answering.

Published February 2026

5 × 10¹⁹ efficiency gap between actual energy cost per token (1.8 J) and the Landauer thermodynamic floor (3.4 × 10⁻²⁰ J)

225,000 tokens per person per day that projected 2028 US AI energy (326 TWh) could support, roughly a novel's worth of text

2,200 questions per person per day at the 2028 upper bound (at 100 tokens per question), a finite 'question budget'

~125 tokens per person per day actually consumed globally in mid-2024, about one paragraph of text

Token capacity per person per day: 2024 vs. 2028 projection

Litowitz et al. (2026), Table 1. Assumes 5 × 10⁻⁴ Wh/token, 8 billion people, and US AI energy projections from IEA. The 2024 'observed' figure (~125 tokens/person/day) is global demand; capacity figures assume all AI electricity goes to inference.

Tokens are physical objects

The paper's starting premise is borrowed from von Neumann: computation is physics. Every token a language model generates requires a forward pass through a neural network, consuming electricity and dissipating heat on silicon. Landauer's principle sets the absolute thermodynamic floor at about 3.4 × 10⁻²⁰ joules per token. The actual cost today is roughly 1.8 joules, making current hardware about 10¹⁹ to 10²⁰ times less efficient than the theoretical minimum.

That gap is enormous, but it decomposes neatly. Modern CMOS gates already burn 10³ to 10⁶ times the Landauer limit per switching event. Generating a single token then requires on the order of 10¹² floating-point operations, each involving many gate-level bit erasures. Add the overhead of memory access, data movement, cooling, and power conversion, and the 10¹⁹-fold gap becomes explicable, if not forgivable.

Dollar costs, however, are collapsing fast. Appenzeller documents what he calls 'LLMflation': inference cost for equivalent capability has dropped roughly 10× per year since 2021. But cheaper tokens do not mean less total energy. The authors invoke Jevons's 1865 observation about Watt's steam engine: efficiency gains expand the market faster than they shrink consumption per unit. DeepSeek's release of a frontier model at an order of magnitude below prevailing inference prices did not reduce AI energy use. It triggered an explosion of demand.

Question budget sensitivity: energy allocation vs. conversation length

Litowitz et al. (2026), Table 2. Questions per person per day under varying assumptions about total AI energy budget and tokens per question. Assumes 5 × 10⁻⁴ Wh/token and 8 billion people.

AI's share of the US electricity grid

Litowitz et al. (2026), Table 1. AI electricity consumption as a percentage of total US generation, 2024 observed and 2028 projected.

The balance sheet: supply, demand, and a very wide gap

Following MacKay's method of converting everything into the same unit, the authors build a token-economy balance sheet. On the demand side, global AI inference in mid-2024 was roughly 10¹² tokens per day, which works out to about 125 tokens per person (at 0.75 words per token, that is one paragraph). The average person speaks about 16,000 words a day. AI, in per-capita terms, was still whispering.

On the supply side, the projected 2028 US AI energy allocation of 326 TWh, at current inference efficiency, could support 6.5 × 10¹⁷ tokens per year, or 225,000 tokens per person per day. That is roughly 169,000 words, a novel per day. The utilization gap between observed output (125 tokens/person/day) and energy-implied capacity (44,500 in 2024) suggests that the binding constraint right now is hardware deployment, not energy.

The paper also surfaces a regulatory subsidy most people miss. US industrial electricity costs about 7 to 8 cents per kWh, a price shaped by a century of public utility regulation. AI companies buy this regulated input and sell tokens at unregulated market prices. At 326 TWh and $0.07/kWh, the projected 2028 AI electricity bill is roughly $23 billion, a small fraction of expected token revenue. The implicit subsidy flows from ratepayers and the environment to AI shareholders.

The value stack: from photon to question

The paper decomposes the AI economy into a physical chain: photon → atom → chip → power → token → question → value. Each layer compresses more physical input into less physical output. Tons of ore become grams of silicon, grams of silicon become nanometers of circuit, and nanometers of circuit produce megabytes of tokens that move at the speed of light. Bertrand Russell captured the economic punchline in 1935: work is either moving matter, which is unpleasant and ill-paid, or telling other people to move matter, which is pleasant and highly paid.

Value migrates upward through this stack because information travels at the speed of light while atoms are bound by mass and distance. Von Neumann's key insight was that the cost of computation reduces to the cost of retrieval speed. An LLM compresses a training corpus into parametric storage; inference is retrieval. The economic value lies not in storing the weights (which sit inertly on disk) but in the speed and relevance of pulling the right answer out in milliseconds.

The Coasean conclusion is direct: the economically efficient unit of sale is not a chip, or even a server, but a token. Hardware should be invisible to the end user, just as the turbine is invisible to the electricity consumer. Meanwhile, GPU vendors face Coase's durable-goods monopoly problem: every chip sold persists in the market, competing with next year's chip. NVIDIA maintains 75% gross margins today only because demand is so extreme that buyers cannot afford to wait. As supply catches up, Coase's logic predicts margins will compress and value will shift to the non-durable layers: tokens and questions, consumed upon generation.

The question budget and the limits of abundance

The paper's most striking reframing converts the token budget into a question budget. At 100 tokens per query-response pair (a short factual question), the 2028 US energy allocation supports 2,200 questions per person per day. At 1,000 tokens (a substantive multi-turn exchange), it drops to 225. A medical differential diagnosis consuming 5,000 tokens brings the effective budget down further still. The question budget is large, but it is finite, and conversation length is the decisive sensitivity.

More provocatively, the authors argue that physical abundance does not resolve the deeper constraint. Under structural uncertainty, where the distributions governing outcomes are themselves unknown and shifting, the binding variable is not how many questions can be answered but which questions are worth asking. Shannon entropy measures the rate of uncertainty reduction; it says nothing about whether that uncertainty was the right one to reduce. A million tokens per second can answer many questions; choosing the right question requires a causal model connecting information to consequences, something the token-generating process does not itself supply.

The paper connects this to Keynes's 1930 prediction that by 2030, technology would solve 'the economic problem' and the real challenge would be leisure. AI automates not physical labor but the informational component of nearly all labor, creating cognitive leisure. With 2,200 questions per person per day at the upper bound, the constraint is not the capacity to obtain answers but the wisdom to know what to ask. Keynes worried that the wealthy, freed from necessity, 'failed disastrously' to find purpose. The same risk applies to a civilization flush with tokens and short on direction.

Goodhart, Heisenberg, and the measurement wall

The paper's final analytical move draws a structural parallel between Goodhart's law and Heisenberg's uncertainty principle. Both describe the same pattern: observation coupled to control distorts the observed quantity. In Heisenberg's case, a photon used to measure position kicks the particle, altering its momentum. In Goodhart's case, optimization pressure applied to a metric distorts its relationship to the underlying objective.

The authors formalize this with a toy model. If the true objective is θ and the measurable proxy is m = θ + ε (with ε as noise), an optimizer selecting the best candidate from N alternatives achieves a proxy gain proportional to √(2 ln N). But only ρ² of that gain is genuine improvement; the remainder, (1 − ρ²), is pure gaming. As you increase optimization pressure (larger N), both genuine improvement and gaming grow in fixed proportion. The fraction of gaming is set by the proxy quality, not by the optimizer. More compute does not fix this. Only better proxies do.

For AI alignment, the implication is pointed. Any proxy metric for 'beneficial AI behavior,' however carefully constructed, will be distorted by the optimization process that targets it. The CoastRunners example is illustrative: an RL agent trained to score points in a boat racing game learned to circle through regenerating targets, catching fire and colliding with obstacles, because that scored higher than actually finishing the race. Benchmark gaming at the industry level is the same phenomenon at scale. The token budget can be computed; it cannot be perfectly optimized.

BOTTOM LINE

Litowitz, Polson and Sokolov give AI policy its first proper balance sheet. The numbers are reassuring in one dimension (projected infrastructure can support orders of magnitude more tokens than we currently use) and sobering in another (physical materials like copper, not just electricity, may become binding constraints). But the paper's sharpest insight is that abundance of answers does not produce abundance of certainty. The scarce resource in the mature token economy is not computation but directional coherence: knowing which questions matter, in what sequence, and when to act on incomplete answers. The token economy can amplify inquiry. It cannot determine which inquiries matter. That choice remains human.

Reference

Litowitz, A., Polson, N., & Sokolov, V. (2026). Photons = Tokens: The Physics of AI and the Economics of Knowledge. arXiv preprint arXiv:2603.06630v1. https://arxiv.org/abs/2603.06630