Designing AI Operating Models Around Humans

How humans are adapting to AI between June 2024 and June 2026, weighing measured benefits and harms, and how organizations should design operating models around human cognitive load and behavioural patterns rather than forcing adoption, covering cognitive overload from supervising multiple agents at machine speed (context switching, automation complacency, vigilance fatigue), the poor budget and value outcomes of top-down AI mandates and token-maximizing usage, the gap between model welfare functions (such as Anthropic's) and any equivalent human or worker welfare function, and how much good human outcomes depend on model training versus orchestration and deployment design.

GPT-5.5
financial
frontier
academic
vc
blogs
tech

Synthesised 2026-06-16

Narrative

Practitioner evidence from 2024 to 2026 points to a split picture. Google's enterprise RCT found about a 21% reduction in task time for a complex internal task, and IBM's internal study also reports net productivity gains for many developers. But DORA's 2024 report found that rising AI adoption was associated with lower delivery stability, lower throughput, and less time spent on valuable work, which RedMonk argues is what happens when firms optimise code generation rather than the real bottleneck in the delivery system.

The strongest cautionary evidence comes from settings where people must supervise AI inside mature systems rather than greenfield tasks. METR's 2025 field experiment found experienced open-source developers were 19% slower with frontier tools, despite expecting a 24% speed-up, and another 2025 study found that AI-assisted output can shift maintenance burden onto core developers, increasing review load while reducing their own original-code productivity. A separate open-source study found project productivity gains, but also a 41.6% rise in integration time, which is a direct measure of coordination cost rather than coding speed.

Workplace reports now give names to the hidden human tax. Harvard Business Review and BetterUp describe "workslop", polished AI output that transfers effort to colleagues, while Glean's 2026 findings describe "botsitting", with workers spending 6.4 hours a week supplying context, checking outputs, and fixing errors. Developer surveys line up with that pattern: Stack Overflow's 2025 results show AI usage has become mainstream, yet distrust has risen, and JetBrains' ecosystem data shows many teams still want humans to keep hold of code review and testing.

The operating-model implication is consistent across these sources. DORA's newer framing around burnout, friction, and perceived value, the SPACE study's emphasis on team support, and the Stack Overflow and JetBrains survey results all point away from top-down usage mandates and towards selective deployment, explicit review boundaries, and workflow redesign. The evidence is strongest where organisations treat AI as a system component that must fit human attention, judgement, and collaboration, not as a token-maximising substitute for them.

Sources

ID	Title	Outlet	Date	Significance
p1	[DORA	Accelerate State of DevOps Report 2024](https://dora.dev/research/2024/dora-report/)	DORA, Google Cloud	2024
p2	2025 DORA State of AI Assisted Software Development	Google Cloud	2025	This follow-on DORA report matters because it shifts assessment from raw output to team archetypes and human factors such as burnout, friction, and perceived value.
p3	DORA Report 2024 – A Look at Throughput and Stability	RedMonk	2024-11	Rachel Stephens translates the DORA findings into an operating-model critique, arguing that code generation may not be the bottleneck and that organisations can optimise the wrong constraint.
p4	How much does AI impact development speed? An enterprise-based randomized controlled trial	arXiv	2024-10	Google's RCT with full-time engineers provides one of the cleaner measured-benefit studies, finding about a 21% reduction in time on a complex enterprise task under controlled conditions.
p5	Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise	arXiv	2024-12	IBM's internal deployment study shows that enterprise gains are uneven, with productivity benefits present but not universal, and with responsibility and ownership of generated code becoming central issues.
p6	Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity	arXiv	2025-07	METR's field experiment is a strong counterweight to vendor claims, finding experienced open-source developers were 19% slower with frontier AI tools despite expecting to be faster.
p7	The SPACE of AI: Real-World Lessons on AI's Impact on Developers	arXiv	2025-07	This mixed-methods study uses the SPACE framework to show that benefits cluster around routine work and depend heavily on task complexity, peer learning, and organisational support.
p8	The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot	arXiv	2024-10	This study finds project-level productivity gains in open source but also a 41.6% increase in integration time, making coordination cost a first-order part of the AI productivity story.
p9	AI-assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden	arXiv	2025-10	This paper matters because it shows productivity gains can shift maintenance and review burden onto core developers, worsening outcomes for the people who carry system knowledge.
p10	Developer Productivity with GenAI	arXiv	2025-10	Using the SPACE lens with 415 practitioners, this paper argues that faster output does not reliably translate into better software or better wellbeing, which is central to judging AI adoption programmes.
p11	What do professional software developers need to know to succeed in an age of Artificial Intelligence?	arXiv	2025-05	This practitioner-focused study reframes the adaptation problem around workflow judgement, adjacent engineering skills, and non-technical skills rather than prompt fluency alone.
p12	What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow	arXiv	2025-10	By mining Stack Overflow, this paper identifies recurring implementation pain around orchestration complexity, evaluation reliability, and runtime integration in agent systems.
p13	AI-Generated “Workslop” Is Destroying Productivity	Harvard Business Review	2025-09	This study-backed HBR piece gives a concrete mechanism for failed ROI, namely low-effort AI output that looks plausible but pushes cognitive and rework costs onto colleagues.
p14	[Workslop: The Hidden Cost of AI-Generated Busywork	BetterUp Labs](https://www.betterup.com/workslop)	BetterUp Labs	2025
p15	Researchers Asked LLMs for Strategic Advice. They Got “Trendslop” in Return.	Harvard Business Review	2026-03	This HBR article extends the critique beyond code into managerial judgement, showing how models can produce fashionable but shallow recommendations that reward buzzwords over reasoning.
p16	'Botsitting' is destroying productivity as workers spend nearly a full day each week making AI 'usable'	ITPro	2026-06	This report on Glean's Work AI Institute findings captures the hidden supervision tax, 6.4 hours a week spent feeding context, checking outputs, and correcting errors.
p17	3 in 4 workers say AI reduced productivity and increased workloads, survey finds	Business Insider	2024-08	Upwork's survey is useful as an early warning that mandate-led adoption can raise review load and learning overhead faster than it creates value.
p18	84% of software developers are now using AI, but nearly half 'don't trust' the technology over accuracy concerns	ITPro	2025-08	This Stack Overflow survey coverage anchors the adoption-trust gap, with broad usage rising while distrust, debugging effort, and security concern remain high.
p19	UK software developers are still cautious about AI, and for good reason	ITPro	2025-10	JetBrains' ecosystem survey adds a regional practitioner view showing that caution concentrates around code quality, privacy, and retaining human control over reviews and testing.
p20	No AI overload just yet? Google's new survey reveals how developers are really using AI at work	TechRadar	2025-10	This report on Google's survey is valuable because it pairs very high developer adoption with low strong trust, supporting the claim that supervision, not surrender, remains the norm.