Designing AI Operating Models Around Humans

How humans are adapting to AI between June 2024 and June 2026, weighing measured benefits and harms, and how organizations should design operating models around human cognitive load and behavioural patterns rather than forcing adoption, covering cognitive overload from supervising multiple agents at machine speed (context switching, automation complacency, vigilance fatigue), the poor budget and value outcomes of top-down AI mandates and token-maximizing usage, the gap between model welfare functions (such as Anthropic's) and any equivalent human or worker welfare function, and how much good human outcomes depend on model training versus orchestration and deployment design.

GPT-5.5
financial
frontier
academic
vc
blogs
tech

Synthesised 2026-06-16

Narrative

The strongest independent writing in this lane converges on a simple point: measured gains from AI are real, but they depend on how work is structured around it. Ethan Mollick's "Management as AI superpower" argues that the scarce resource is no longer raw execution but the ability to specify, delegate, and judge outputs. Simon Willison makes the same point from software practice, writing that agentic engineering makes code cheap and shifts the bottleneck to testing, verification, and deep domain understanding.

Several sources also describe a human-attention problem that gets worse as agents speed up. In "Claude Dispatch and the Power of Interfaces", Mollick cites a valuation-task study in which chatbot interaction imposed a mental tax, especially on less experienced workers, because users had to sift through walls of text and branching suggestions. Willison's April 2026 account is blunter: running four coding agents in parallel was enough to leave a 25-year software veteran wiped out by 11 a.m., which lines up with the June 2026 arXiv paper showing that unconstrained multi-agent workflows fail far more often than workflows with deterministic steps and explicit human gates.

On organisational design, the sources lean against top-down adoption theatre. Recent Business Insider reporting says executives at Replit, BNP Paribas CIB, La Banque Postale, TCS, and NTT DATA are rejecting token counts and leaderboards as ROI metrics, while the April 2026 arXiv token-consumption paper finds that higher token use does not reliably buy higher accuracy. Narayanan and Kapoor's "AI as Normal Technology" and Evans's "Predicting AI job exposure" both push in the same direction: value comes slowly, through redesign of workflows and institutions, not from mandates or role-level forecasts. Jonathon Ready's essay on Anthropic's silent restrictions adds a different warning, that lab-side welfare and policy choices can diverge from user welfare unless deployment makes those trade-offs visible and contestable.

Sources

ID	Title	Outlet	Date	Significance
b1	Management as AI superpower	One Useful Thing	2026-01	Ethan Mollick argues that agentic work shifts value from execution to delegation, evaluation, and specification, grounding the organisational case for treating management skill and subject matter expertise as the scarce resource.
b2	Claude Dispatch and the Power of Interfaces	One Useful Thing	2026-03	Mollick uses a valuation-task paper to show that chatbot UX can erase AI productivity gains by overloading users with sprawling outputs, making interface design a first-order operating-model issue.
b3	Choosing to Stay Human	One Useful Thing	2026-05	This essay frames AI adoption as a choice about where to preserve human skill formation, warning that convenience can erode writing judgement and flood attention with low-meaning synthetic content.
b4	The lethal trifecta for AI agents: private data, untrusted content, and external communication	Simon Willison’s Weblog	2025-06	Willison reduces a broad agent-safety debate to a concrete deployment rule: combining private-data access, untrusted input, and external communication creates a prompt-injection exfiltration hazard.
b5	Writing about Agentic Engineering Patterns	Simon Willison’s Weblog	2026-02	Willison argues that coding agents lower the cost of producing working code to near zero, shifting the bottleneck to verification and changing how teams should structure engineering work.
b6	Highlights from my conversation about agentic engineering on Lenny’s Podcast	Simon Willison’s Weblog	2026-04	Willison gives a practitioner account of the cognitive cost of supervising multiple agents in parallel, describing four concurrent coding agents as enough to wipe out an experienced engineer by late morning.
b7	If Claude Fable stops helping you, you'll never know	Jonathon Ready	2026-06	Ready identifies a direct conflict between a lab's hidden model-side restrictions and user welfare, arguing that silent degradation creates supply-chain risk for ordinary software teams building AI features.
b8	AI as Normal Technology	AI as Normal Technology	2025-04	Arvind Narayanan and Sayash Kapoor argue against technological determinism, separating model progress from application design and adoption, which is central to judging whether harms are fixed in training or in deployment.
b9	New Paper: Towards a science of AI agent reliability	AI as Normal Technology	2026-02	Kapoor, Narayanan, and Stephan Rabanser argue that capability gains have outpaced reliability gains, offering a framework for why impressive demos do not automatically translate into dependable organisational use.
b10	Why AI hasn’t replaced software engineers, and won’t	AI as Normal Technology	2026-06	This essay rejects the threshold story of sudden labour replacement, arguing instead that AI compresses execution while decision-making, verification, and accountability remain stubbornly human.
b11	Predicting AI job exposure	Benedict Evans	2026-05	Evans argues that job-exposure charts miss how automation changes business models, regulation, and task composition, which makes simple role-level forecasts unreliable for planning.
b12	(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable	arXiv	2026-06	This paper reports that an unconstrained multi-agent baseline failed in 72 percent of runs, while a harness with deterministic computation and three human decision gates cut failures to 16 percent.
b13	Oversight Structures for Agentic AI in Public-Sector Organizations	arXiv	2025-06	This paper argues that agentic AI requires continuous oversight, tighter integration of governance with operations, and cross-departmental coordination rather than episodic compliance review.
b14	As companies rethink AI ROI, Replit's AI chief calls token leaderboards 'very dystopian'	Business Insider	2026-06	The report captures an emerging backlash against token-maximising mandates, with Replit's Michele Catasta arguing that raw token burn is a misleading proxy for productivity or value.
b15	I asked 4 executives how they measure AI ROI. None started with AI tokens.	Business Insider	2026-06	This report shows large organisations moving from activity metrics to outcome metrics, with BNP Paribas CIB, La Banque Postale, TCS, and NTT DATA all rejecting token counts as the main ROI measure.
b16	How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks	arXiv	2026-04	The paper finds that agentic coding runs can consume 1000 times more tokens than simpler code interactions, with high variance and no reliable link between higher token use and better accuracy.