Research · Tech Industry & Practitioner
Back to sweepResearch sweep · deep · 2024 – 2026
Designing AI Operating Models Around Humans
How humans are adapting to AI between June 2024 and June 2026, weighing measured benefits and harms, and how organizations should design operating models around human cognitive load and behavioural patterns rather than forcing adoption, covering cognitive overload from supervising multiple agents at machine speed (context switching, automation complacency, vigilance fatigue), the poor budget and value outcomes of top-down AI mandates and token-maximizing usage, the gap between model welfare functions (such as Anthropic's) and any equivalent human or worker welfare function, and how much good human outcomes depend on model training versus orchestration and deployment design.
- GPT-5.5
- financial
- frontier
- academic
- vc
- blogs
- tech
Synthesised 2026-06-15
Narrative
Practitioner evidence from 2024 to 2026 points to a split picture. Google's enterprise RCT found about a 21% reduction in task time for a complex internal task, and IBM's internal study also reports net productivity gains for many developers. But DORA's 2024 report found that rising AI adoption was associated with lower delivery stability, lower throughput, and less time spent on valuable work, which RedMonk argues is what happens when firms optimise code generation rather than the real bottleneck in the delivery system.
The strongest cautionary evidence comes from settings where people must supervise AI inside mature systems rather than greenfield tasks. METR's 2025 field experiment found experienced open-source developers were 19% slower with frontier tools, despite expecting a 24% speed-up, and another 2025 study found that AI-assisted output can shift maintenance burden onto core developers, increasing review load while reducing their own original-code productivity. A separate open-source study found project productivity gains, but also a 41.6% rise in integration time, which is a direct measure of coordination cost rather than coding speed.
Workplace reports now give names to the hidden human tax. Harvard Business Review and BetterUp describe "workslop", polished AI output that transfers effort to colleagues, while Glean's 2026 findings describe "botsitting", with workers spending 6.4 hours a week supplying context, checking outputs, and fixing errors. Developer surveys line up with that pattern: Stack Overflow's 2025 results show AI usage has become mainstream, yet distrust has risen, and JetBrains' ecosystem data shows many teams still want humans to keep hold of code review and testing.
The operating-model implication is consistent across these sources. DORA's newer framing around burnout, friction, and perceived value, the SPACE study's emphasis on team support, and the Stack Overflow and JetBrains survey results all point away from top-down usage mandates and towards selective deployment, explicit review boundaries, and workflow redesign. The evidence is strongest where organisations treat AI as a system component that must fit human attention, judgement, and collaboration, not as a token-maximising substitute for them.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| p1 | [DORA | Accelerate State of DevOps Report 2024](https://dora.dev/research/2024/dora-report/) | DORA, Google Cloud | 2024 |
| p2 | 2025 DORA State of AI Assisted Software Development | Google Cloud | 2025 | This follow-on DORA report matters because it shifts assessment from raw output to team archetypes and human factors such as burnout, friction, and perceived value. |
| p3 | DORA Report 2024 – A Look at Throughput and Stability | RedMonk | 2024-11 | Rachel Stephens translates the DORA findings into an operating-model critique, arguing that code generation may not be the bottleneck and that organisations can optimise the wrong constraint. |
| p4 | How much does AI impact development speed? An enterprise-based randomized controlled trial | arXiv | 2024-10 | Google's RCT with full-time engineers provides one of the cleaner measured-benefit studies, finding about a 21% reduction in time on a complex enterprise task under controlled conditions. |
| p5 | Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise | arXiv | 2024-12 | IBM's internal deployment study shows that enterprise gains are uneven, with productivity benefits present but not universal, and with responsibility and ownership of generated code becoming central issues. |
| p6 | Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity | arXiv | 2025-07 | METR's field experiment is a strong counterweight to vendor claims, finding experienced open-source developers were 19% slower with frontier AI tools despite expecting to be faster. |
| p7 | The SPACE of AI: Real-World Lessons on AI's Impact on Developers | arXiv | 2025-07 | This mixed-methods study uses the SPACE framework to show that benefits cluster around routine work and depend heavily on task complexity, peer learning, and organisational support. |
| p8 | The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot | arXiv | 2024-10 | This study finds project-level productivity gains in open source but also a 41.6% increase in integration time, making coordination cost a first-order part of the AI productivity story. |
| p9 | AI-assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden | arXiv | 2025-10 | This paper matters because it shows productivity gains can shift maintenance and review burden onto core developers, worsening outcomes for the people who carry system knowledge. |
| p10 | Developer Productivity with GenAI | arXiv | 2025-10 | Using the SPACE lens with 415 practitioners, this paper argues that faster output does not reliably translate into better software or better wellbeing, which is central to judging AI adoption programmes. |
| p11 | What do professional software developers need to know to succeed in an age of Artificial Intelligence? | arXiv | 2025-05 | This practitioner-focused study reframes the adaptation problem around workflow judgement, adjacent engineering skills, and non-technical skills rather than prompt fluency alone. |
| p12 | What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow | arXiv | 2025-10 | By mining Stack Overflow, this paper identifies recurring implementation pain around orchestration complexity, evaluation reliability, and runtime integration in agent systems. |
| p13 | AI-Generated “Workslop” Is Destroying Productivity | Harvard Business Review | 2025-09 | This study-backed HBR piece gives a concrete mechanism for failed ROI, namely low-effort AI output that looks plausible but pushes cognitive and rework costs onto colleagues. |
| p14 | [Workslop: The Hidden Cost of AI-Generated Busywork | BetterUp Labs](https://www.betterup.com/workslop) | BetterUp Labs | 2025 |
| p15 | Researchers Asked LLMs for Strategic Advice. They Got “Trendslop” in Return. | Harvard Business Review | 2026-03 | This HBR article extends the critique beyond code into managerial judgement, showing how models can produce fashionable but shallow recommendations that reward buzzwords over reasoning. |
| p16 | 'Botsitting' is destroying productivity as workers spend nearly a full day each week making AI 'usable' | ITPro | 2026-06 | This report on Glean's Work AI Institute findings captures the hidden supervision tax, 6.4 hours a week spent feeding context, checking outputs, and correcting errors. |
| p17 | 3 in 4 workers say AI reduced productivity and increased workloads, survey finds | Business Insider | 2024-08 | Upwork's survey is useful as an early warning that mandate-led adoption can raise review load and learning overhead faster than it creates value. |
| p18 | 84% of software developers are now using AI, but nearly half 'don't trust' the technology over accuracy concerns | ITPro | 2025-08 | This Stack Overflow survey coverage anchors the adoption-trust gap, with broad usage rising while distrust, debugging effort, and security concern remain high. |
| p19 | UK software developers are still cautious about AI, and for good reason | ITPro | 2025-10 | JetBrains' ecosystem survey adds a regional practitioner view showing that caution concentrates around code quality, privacy, and retaining human control over reviews and testing. |
| p20 | No AI overload just yet? Google's new survey reveals how developers are really using AI at work | TechRadar | 2025-10 | This report on Google's survey is valuable because it pairs very high developer adoption with low strong trust, supporting the claim that supervision, not surrender, remains the norm. |