Research · Tech Industry & Practitioner

Back to sweep

Research sweep · deep · 2024 – 2026

Designing AI Operating Models Around Humans

How humans are adapting to AI between June 2024 and June 2026, weighing measured benefits and harms, and how organizations should design operating models around human cognitive load and behavioural patterns rather than forcing adoption, covering cognitive overload from supervising multiple agents at machine speed (context switching, automation complacency, vigilance fatigue), the poor budget and value outcomes of top-down AI mandates and token-maximizing usage, the gap between model welfare functions (such as Anthropic's) and any equivalent human or worker welfare function, and how much good human outcomes depend on model training versus orchestration and deployment design.

  • GPT-5.5
  • financial
  • frontier
  • academic
  • vc
  • blogs
  • tech

Synthesised 2026-06-15

Narrative

Practitioner evidence from 2024 to 2026 points to a split picture. Google's enterprise RCT found about a 21% reduction in task time for a complex internal task, and IBM's internal study also reports net productivity gains for many developers. But DORA's 2024 report found that rising AI adoption was associated with lower delivery stability, lower throughput, and less time spent on valuable work, which RedMonk argues is what happens when firms optimise code generation rather than the real bottleneck in the delivery system.

The strongest cautionary evidence comes from settings where people must supervise AI inside mature systems rather than greenfield tasks. METR's 2025 field experiment found experienced open-source developers were 19% slower with frontier tools, despite expecting a 24% speed-up, and another 2025 study found that AI-assisted output can shift maintenance burden onto core developers, increasing review load while reducing their own original-code productivity. A separate open-source study found project productivity gains, but also a 41.6% rise in integration time, which is a direct measure of coordination cost rather than coding speed.

Workplace reports now give names to the hidden human tax. Harvard Business Review and BetterUp describe "workslop", polished AI output that transfers effort to colleagues, while Glean's 2026 findings describe "botsitting", with workers spending 6.4 hours a week supplying context, checking outputs, and fixing errors. Developer surveys line up with that pattern: Stack Overflow's 2025 results show AI usage has become mainstream, yet distrust has risen, and JetBrains' ecosystem data shows many teams still want humans to keep hold of code review and testing.

The operating-model implication is consistent across these sources. DORA's newer framing around burnout, friction, and perceived value, the SPACE study's emphasis on team support, and the Stack Overflow and JetBrains survey results all point away from top-down usage mandates and towards selective deployment, explicit review boundaries, and workflow redesign. The evidence is strongest where organisations treat AI as a system component that must fit human attention, judgement, and collaboration, not as a token-maximising substitute for them.


Sources

ID Title Outlet Date Significance
p1 [DORA Accelerate State of DevOps Report 2024](https://dora.dev/research/2024/dora-report/) DORA, Google Cloud 2024
p2 2025 DORA State of AI Assisted Software Development Google Cloud 2025 This follow-on DORA report matters because it shifts assessment from raw output to team archetypes and human factors such as burnout, friction, and perceived value.
p3 DORA Report 2024 – A Look at Throughput and Stability RedMonk 2024-11 Rachel Stephens translates the DORA findings into an operating-model critique, arguing that code generation may not be the bottleneck and that organisations can optimise the wrong constraint.
p4 How much does AI impact development speed? An enterprise-based randomized controlled trial arXiv 2024-10 Google's RCT with full-time engineers provides one of the cleaner measured-benefit studies, finding about a 21% reduction in time on a complex enterprise task under controlled conditions.
p5 Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise arXiv 2024-12 IBM's internal deployment study shows that enterprise gains are uneven, with productivity benefits present but not universal, and with responsibility and ownership of generated code becoming central issues.
p6 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity arXiv 2025-07 METR's field experiment is a strong counterweight to vendor claims, finding experienced open-source developers were 19% slower with frontier AI tools despite expecting to be faster.
p7 The SPACE of AI: Real-World Lessons on AI's Impact on Developers arXiv 2025-07 This mixed-methods study uses the SPACE framework to show that benefits cluster around routine work and depend heavily on task complexity, peer learning, and organisational support.
p8 The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot arXiv 2024-10 This study finds project-level productivity gains in open source but also a 41.6% increase in integration time, making coordination cost a first-order part of the AI productivity story.
p9 AI-assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden arXiv 2025-10 This paper matters because it shows productivity gains can shift maintenance and review burden onto core developers, worsening outcomes for the people who carry system knowledge.
p10 Developer Productivity with GenAI arXiv 2025-10 Using the SPACE lens with 415 practitioners, this paper argues that faster output does not reliably translate into better software or better wellbeing, which is central to judging AI adoption programmes.
p11 What do professional software developers need to know to succeed in an age of Artificial Intelligence? arXiv 2025-05 This practitioner-focused study reframes the adaptation problem around workflow judgement, adjacent engineering skills, and non-technical skills rather than prompt fluency alone.
p12 What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow arXiv 2025-10 By mining Stack Overflow, this paper identifies recurring implementation pain around orchestration complexity, evaluation reliability, and runtime integration in agent systems.
p13 AI-Generated “Workslop” Is Destroying Productivity Harvard Business Review 2025-09 This study-backed HBR piece gives a concrete mechanism for failed ROI, namely low-effort AI output that looks plausible but pushes cognitive and rework costs onto colleagues.
p14 [Workslop: The Hidden Cost of AI-Generated Busywork BetterUp Labs](https://www.betterup.com/workslop) BetterUp Labs 2025
p15 Researchers Asked LLMs for Strategic Advice. They Got “Trendslop” in Return. Harvard Business Review 2026-03 This HBR article extends the critique beyond code into managerial judgement, showing how models can produce fashionable but shallow recommendations that reward buzzwords over reasoning.
p16 'Botsitting' is destroying productivity as workers spend nearly a full day each week making AI 'usable' ITPro 2026-06 This report on Glean's Work AI Institute findings captures the hidden supervision tax, 6.4 hours a week spent feeding context, checking outputs, and correcting errors.
p17 3 in 4 workers say AI reduced productivity and increased workloads, survey finds Business Insider 2024-08 Upwork's survey is useful as an early warning that mandate-led adoption can raise review load and learning overhead faster than it creates value.
p18 84% of software developers are now using AI, but nearly half 'don't trust' the technology over accuracy concerns ITPro 2025-08 This Stack Overflow survey coverage anchors the adoption-trust gap, with broad usage rising while distrust, debugging effort, and security concern remain high.
p19 UK software developers are still cautious about AI, and for good reason ITPro 2025-10 JetBrains' ecosystem survey adds a regional practitioner view showing that caution concentrates around code quality, privacy, and retaining human control over reviews and testing.
p20 No AI overload just yet? Google's new survey reveals how developers are really using AI at work TechRadar 2025-10 This report on Google's survey is valuable because it pairs very high developer adoption with low strong trust, supporting the claim that supervision, not surrender, remains the norm.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.