Research · Summary
Back to sweepResearch sweep · deep · 2024 – 2026
Designing AI Operating Models Around Humans
How humans are adapting to AI between June 2024 and June 2026, weighing measured benefits and harms, and how organizations should design operating models around human cognitive load and behavioural patterns rather than forcing adoption, covering cognitive overload from supervising multiple agents at machine speed (context switching, automation complacency, vigilance fatigue), the poor budget and value outcomes of top-down AI mandates and token-maximizing usage, the gap between model welfare functions (such as Anthropic's) and any equivalent human or worker welfare function, and how much good human outcomes depend on model training versus orchestration and deployment design.
- GPT-5.5
- financial
- frontier
- academic
- vc
- blogs
- tech
Synthesised 2026-06-15
Humans are becoming the bottleneck in AI operating models
Overview
AI adoption moved faster than organisational redesign between June 2024 and June 2026. Gallup found workplace AI use had nearly doubled in two years by mid-2025, while NBER’s adoption research showed generative AI spreading at a pace comparable to earlier general-purpose technologies. Yet the strongest evidence now points to a less convenient conclusion: access to AI is not the same as value capture, and value capture is not the same as good human outcomes.
Sources: Gallup (2025); NBER (2024)
The defining shift of the period was the move from chat assistance to agentic work. Anthropic introduced computer use in Claude, Google framed Gemini 2.0 as a model for the “agentic era”, OpenAI released Operator and deep research system cards, and Meta published LlamaFirewall for safer AI agents. These releases changed the human problem: workers were no longer only writing prompts, they were also assigning work, monitoring tools, reviewing outputs, catching failures, and deciding when to intervene.
Sources: Anthropic (2024) (↗); Google DeepMind (2024) (↗); OpenAI (2025) (↗); OpenAI (2025) (↗); Meta AI (2025) (↗)
The benefits are real, but they are task-shaped rather than universal. Field experiments with software developers, central bankers and other knowledge workers show speed and productivity gains in bounded tasks, while professional-services and enterprise surveys show rising use in drafting, summarisation, research and workflow acceleration. The harms are also task-shaped: over-reliance, cognitive offloading, deskilling risk, review burden, poor judgement on out-of-frontier tasks, and new coordination costs when AI output flows to colleagues.
Sources: SSRN (2025); SSRN (2025); Organization Science (2026); Thomson Reuters Institute (2024); Thomson Reuters Institute (2025)
The operating-model question therefore matters more than the adoption question. The organisations that treat AI as a licence, mandate or token budget risk creating workslop, review debt and trust decay. The organisations that design around human attention, verification cost, escalation thresholds and team-level rules have a better chance of turning AI into useful work rather than more work.
Sources: Thoughtworks Technology Radar (2026) (↗); MIT Sloan Management Review (2025) (↗); MIT Sloan Management Review (2025) (↗); Harvard Business Review (2025) (↗)
- Enterprise genAI adoption spikes
- GenAI risk profiles become operational guidance
- Everyday AI and digital employee experience move onto CIO roadmaps
- Computer use and agentic models enter workplace products
- Reasoning system cards become standard evidence
- AI task-use indexes map real adoption
- Human-AI teaming meta-analysis challenges simple augmentation claims
- Operator and deep research shift AI toward delegated work
- Model welfare becomes a formal lab topic
- Agent guardrails and firewalls appear
- Work redesign replaces experimentation as the analyst consensus
- AI-assisted software delivery shows productivity and review-load tensions
- Workslop becomes a named productivity harm
- Enterprise AI value measurement tightens
- Reported time savings meet ROI scrutiny
- Human plus AI organisation design becomes an executive theme
- Monitoring deployed AI systems becomes a named governance problem
- Worker welfare gap becomes visible
- Agent oversight and overload become explicit research objects
Sources: McKinsey (2024) (↗); NIST (2024); Gartner (2024) (↗); Anthropic (2024) (↗); Google DeepMind (2024) (↗); OpenAI (2024) (↗); Anthropic (2025) (↗); Nature Human Behaviour (2025) (↗); OpenAI (2025) (↗); OpenAI (2025) (↗); Anthropic (2025) (↗); Meta AI (2025) (↗); DORA (2025) (↗); Harvard Business Review (2025) (↗); Forrester (2026) (↗); Economist Impact (2026); NIST AI 800-4 announcement/report (2026); arXiv (2026); arXiv (2026)
Key Findings
1. Adoption is broad, but not deep enough to infer value
The sweep converges on one distinction: adoption has become normal, but productive absorption has not. McKinsey reported a spike in genAI adoption in early 2024, Bain called generative AI virtually ubiquitous in global business, and Gallup later found workplace AI use had nearly doubled in two years. Those sources support a diffusion story, not a return-on-investment story.
Sources: McKinsey (2024) (↗); Bain & Company (2024) (↗); Gallup (2025)
The stronger analyst reports moved from “who has access?” to “who has rewired work?” McKinsey’s 2025 state-of-AI work focused on organisational rewiring, Forrester’s 2026 value matrix argued for measuring what matters, and Gartner’s everyday-AI coverage linked AI progress to digital employee experience rather than raw deployment.
Sources: McKinsey (2025) (↗); Forrester (2026) (↗); Gartner (2024) (↗)
2. The best productivity gains appear in bounded, verifiable work
The clearest empirical gains come from tasks with visible outputs, tolerable error costs and manageable verification. Management Science and Microsoft Research field experiments found productivity improvements among software developers using generative AI tools, while SSRN work in central banking found gains when the task structure fitted model capabilities.
Sources: Management Science (2025) (↗); Microsoft Research (2025) (↗); SSRN (2025)
The same pattern appears in enterprise data. Anthropic’s Economic Index showed uneven task adoption, with software and other digitally mediated work heavily represented, while Thomson Reuters found rising generative AI use in professional services. These are settings where workers can often compare AI output against domain standards, documents, code or client requirements.
Sources: Anthropic (2025); Anthropic (2025) (↗); Thomson Reuters Institute (2025)
3. Human-AI teams do not automatically beat humans or AI alone
The Nature Human Behaviour systematic review and meta-analysis is important because it interrupts the easy “human plus AI” slogan. It found that combinations of humans and AI are useful only under certain task and design conditions, rather than being inherently superior to either humans or AI alone.
Sources: Nature Human Behaviour (2025) (↗)
That finding lines up with field evidence on the jagged frontier. Organization Science research on knowledge workers found that AI improved performance inside the frontier but could harm quality when workers applied it to tasks outside its strengths. The lesson is not that humans should always stay in the loop, but that the loop must be designed around task fit, verification cost and error consequences.
Sources: Organization Science (2026); NBER (2025) (↗)
4. The role split is by judgement and autonomy, not by generation
The “Gen Z versus everyone” framing misses the operating reality. NBER and field-experiment evidence suggests less-experienced workers can gain more on some tasks because AI supplies templates, language, examples and procedural guidance. At the same time, junior workers face a learning risk if AI removes the difficult practice through which judgement forms.
Sources: NBER / SSRN (2024); SSRN (2025); PNAS (2025)
Senior workers often gain from AI because they have enough domain judgement to detect weak outputs, decompose tasks and decide when to ignore the model. Simon Willison made this point sharply for coding agents, arguing that they require skilled operators rather than novice button-pushers. The implication is a class and career-stage split: AI can compress some performance gaps while widening rewards for people who already know what good looks like.
Sources: Simon Willison’s Weblog (2025) (↗); Stack Overflow (2026) (↗); Economist Impact (2025)
5. Cognitive offloading is now measurable enough to manage
Microsoft Research found that knowledge workers who used generative AI reported reductions in cognitive effort, with confidence in AI associated with less critical thinking and self-confidence associated with more critical thinking. This does not prove that AI makes workers less capable, but it does show a behavioural pattern that organisations need to manage: people adapt their effort to the tool.
Sources: Microsoft Research (2025) (↗)
Academic work on algorithmic conformity and human-AI feedback loops adds a stronger warning. Repeated exposure to AI advice can shift human judgement, while guardrail-free AI tutoring can harm later learning even when it helps with immediate answers. The harm is not only a bad output, it is a changed worker or learner.
Sources: Nature Human Behaviour (2024); PNAS (2025); SSRN (2025)
6. Agentic systems turn production work into supervision work
Agentic releases changed the cost structure of work. OpenAI’s Operator and deep research system cards, Anthropic’s computer-use release and Meta’s agent guardrail work all assume systems that can act across tools, websites or workflows. That shifts human labour toward instruction, monitoring, interruption, correction and exception handling.
Sources: OpenAI (2025) (↗); OpenAI (2025) (↗); Anthropic (2024) (↗); Meta AI (2025) (↗)
The direct evidence on multi-agent vigilance fatigue in ordinary enterprises remains thin, but the software-agent evidence is moving in that direction. 2026 arXiv work on human oversight of agentic systems and overload in AI-assisted software engineering described oversight work as a real, costly burden, while Stack Overflow found agentic AI at work remained mostly single-agent and monitored. Organisations are not yet letting agents run free, but they are already asking humans to become traffic controllers.
Sources: arXiv (2026); arXiv (2026); Stack Overflow (2026) (↗)
7. Top-down mandates look weak when they measure activity rather than outcomes
The strongest evidence does not support blanket AI mandates as a value strategy. Forrester argued in 2026 that many enterprises were still chasing the true value of genAI three years in, while its AI Value Matrix pushed firms toward value measures rather than adoption counts. Thoughtworks specifically warned against coding throughput as a productivity measure because output volume can increase review load, defects and downstream risk.
Sources: Forrester (2026) (↗); Forrester (2026) (↗); Thoughtworks Technology Radar (2026) (↗)
The most credible operating advice favours local rules within enterprise guardrails. MIT Sloan Management Review argued that team leaders should write AI rules for productivity gains, and DORA described AI as an amplifier of existing software-delivery systems rather than a cure for weak engineering practice. That cuts against usage quotas, token-maximising programmes and executive theatre.
Sources: MIT Sloan Management Review (2025) (↗); DORA (2025) (↗); DORA (2026) (↗)
8. Workslop is the social form of bad AI ROI
AI-generated workslop names a common failure mode: output that looks plausible enough to pass upward, but is vague, wrong or incomplete enough to impose work on someone else. Harvard Business Review framed it as productivity destruction, not mere annoyance. This bridges the human-factors and ROI arguments because the cost appears as peer review, rework, reputation damage and slower decision-making.
Sources: Harvard Business Review (2025) (↗)
This is why token counts and prompt counts are poor management metrics. They capture machine activity and user compliance, not whether work moved faster, quality improved, risk fell or cognitive load became sustainable. OpenAI’s workspace analytics and enterprise reporting may help firms see usage, but usage telemetry must be joined to business outcomes and human workload measures.
Sources: OpenAI Help Center (2026); Forrester (2026) (↗); NBER (2025)
9. Labs have model welfare, but firms lack worker welfare
Anthropic made model welfare an explicit research topic in April 2025 and continued to formalise model behaviour through constitutions, system cards and responsible-scaling updates. That is a notable institutional development: one frontier lab now has language and staff attention for possible model interests or model treatment.
Sources: Anthropic (2025); Anthropic (2026) (↗); Anthropic (2026)
No equivalent worker-welfare function appears across the enterprise sources. The closest substitutes are digital employee experience, responsible AI governance, HR-led redesign, DevEx programmes and AI risk management. Those are useful, but they do not yet create a clear owner for attention load, judgement erosion, deskilling, surveillance pressure, escalation burden or adoption externalities.
Sources: Gartner (2024) (↗); NIST (2024); ACM Queue (2024) (↗); Economist Impact (2026)
10. Training matters, but deployment design decides most human outcomes
Model training shapes capability, refusal behaviour, tone, tool discipline and safety boundaries. The system-card record from OpenAI, Anthropic, xAI and others shows labs spending more effort on evaluations, mitigations and release conditions as models become more agentic.
Sources: OpenAI (2024) (↗); OpenAI (2025) (↗); Anthropic (2025) (↗); xAI (2025) (↗)
The main workplace harms, however, appear at the orchestration layer. Bad task allocation, poor interfaces, constant interruptions, unclear escalation rules, unbudgeted review time and weak measurement can turn a capable model into a net burden. NIST’s monitoring work, DORA’s software-delivery findings, MIT Sloan’s work-redesign advice and Mollick’s interface-centred writing all point to the same lever: design the system around human attention and behaviour.
Sources: NIST AI 800-4 announcement/report (2026); DORA (2025) (↗); MIT Sloan Management Review (2025) (↗); One Useful Thing (2026) (↗)
Evidence & Data
The most useful quantitative evidence falls into four buckets.
First, adoption. NBER’s rapid-adoption research measured unusually fast diffusion of generative AI among US adults and workers, while Gallup found workplace AI use had nearly doubled in two years. Thomson Reuters reported that generative AI adoption nearly doubled in professional services by 2025, and Anthropic’s Economic Index mapped usage by task and geography rather than relying only on surveys.
Sources: NBER / SSRN (2024); Gallup (2025); Thomson Reuters Institute (2025); Anthropic Research (2025) (↗)
Second, task productivity. Microsoft Research and Management Science field experiments with software developers found measurable productivity gains, and OpenAI’s enterprise report found workers saved nearly an hour a day on average. These findings matter because they use workplace settings or enterprise reporting, but they still need careful interpretation because saved time only becomes value when organisations decide what the time is for.
Sources: Microsoft Research (2025) (↗); Management Science (2025) (↗); OpenAI (2025)
Third, learning and judgement. PNAS found that generative AI without guardrails could harm learning in high-school mathematics, while Microsoft Research found self-reported reductions in cognitive effort among knowledge workers. SSRN work on algorithmic conformity and Nature Human Behaviour work on human-AI feedback loops show that AI can alter human judgement, not merely assist it.
Sources: PNAS (2025); Microsoft Research (2025) (↗); SSRN (2025); Nature Human Behaviour (2024)
Fourth, developer trust and review. Stack Overflow’s 2024 and 2025 surveys showed the gap between willingness to use AI and trust in its output, while its 2026 agentic-AI work found agents remained mostly monitored at work. DORA and Thoughtworks then explain why this matters operationally: AI can increase apparent output while adding review, integration and risk-management load.
Sources: Stack Overflow (2024) (↗); Stack Overflow (2025) (↗); Stack Overflow (2026) (↗); DORA (2025) (↗); Thoughtworks Technology Radar (2026) (↗)
Signals & Tensions
-
Vendor telemetry is improving, but independence remains uneven. Anthropic’s Economic Index and OpenAI’s enterprise reporting offer more granular evidence than ordinary surveys, yet they still reflect product-specific populations and provider incentives. NBER, academic field experiments and independent surveys remain the stronger anchors for general claims.
Sources: Anthropic (2025); OpenAI (2025); NBER / SSRN (2024); SSRN (2025) -
Agent rhetoric is ahead of enterprise practice. Frontier labs and investors describe increasingly autonomous agents, but Stack Overflow found workplace agents remained mostly monitored and single-agent in 2026. That gap suggests organisations still distrust unsupervised autonomy, or cannot yet absorb it safely.
Sources: OpenAI (2025) (↗); CB Insights (2025) (↗); Stack Overflow (2026) (↗) -
Human-in-the-loop is overused as a comfort phrase. NIST’s monitoring work and arXiv studies of agent oversight show that monitoring deployed AI systems is difficult and labour-intensive. LessWrong writers make the sharper version of the same argument: a human reviewer does not create meaningful oversight if the system is too fast, opaque or complex to inspect.
Sources: NIST AI 800-4 announcement/report (2026); arXiv (2026); LessWrong (2026) (↗); LessWrong (2026) (↗) -
The ROI debate is less about whether AI works than where the costs land. Field experiments show gains, but HBR’s workslop argument and DORA’s delivery findings show that output can move cost downstream. The open management problem is whether firms can see the whole workflow rather than celebrating the first person’s saved time.
Sources: Microsoft Research (2025) (↗); Harvard Business Review (2025) (↗); DORA (2025) (↗) -
The worker-welfare gap is underreported. AI safety institutions now discuss model behaviour, model welfare and frontier risk, while enterprise AI governance still tends to discuss compliance, productivity, risk and skills. Attention load, judgement quality and career development rarely have a named executive owner.
Sources: Anthropic (2025); METR (2025) (↗); McKinsey (2025) (↗); Gartner (2024) (↗)
Open Questions
-
Enterprises still lack good measures of cognitive load in AI work. Usage telemetry can count prompts, seats and tokens, but it does not measure context switching, vigilance fatigue, interruption cost or the time spent checking AI output. NIST and NBER both identify AI measurement and monitoring as live problems.
Sources: NBER (2025); NIST AI 800-4 announcement/report (2026) -
No one knows the safe agent-to-human ratio for ordinary knowledge work. Software-agent studies show oversight burden, and Stack Overflow shows continued monitoring, but there is little field evidence on how many agents one worker can supervise without quality collapse.
Sources: arXiv (2026); arXiv (2026); Stack Overflow (2026) (↗) -
The long-term learning effect remains unresolved. PNAS shows harm from unguarded AI in mathematics learning, while workplace studies show near-term productivity gains. The missing evidence is longitudinal: whether workers who rely on AI develop judgement faster, slower or differently over years.
Sources: PNAS (2025); Microsoft Research (2025) (↗) -
The budget effect of mandates is not well measured. Analyst and practitioner sources warn against activity metrics and top-down adoption theatre, but the sweep did not surface many independent studies that directly compare mandated AI programmes with voluntary, team-designed programmes.
Sources: Forrester (2026) (↗); Thoughtworks Technology Radar (2026) (↗); MIT Sloan Management Review (2025) (↗) -
Worker-welfare governance has no settled home. HR, CIO, risk, legal, responsible AI and line managers each own fragments of the problem, but none naturally owns the full stack of workload, trust calibration, learning, surveillance pressure and job quality. Economist Impact, Gartner and NIST point to adjacent structures, not a mature function.
Sources: Economist Impact (2026); Gartner (2024) (↗); NIST (2024) -
The training-versus-deployment split will become harder as agents improve. Better models may reduce some errors and review effort, but more capable agents also increase task duration, action scope and monitoring complexity. The practical answer for now is to treat model quality as necessary infrastructure, while making orchestration, pacing, escalation and accountability the centre of the operating model.
Sources: OpenAI (2025) (↗); Anthropic (2025) (↗); NIST AI 800-4 announcement/report (2026); One Useful Thing (2025) (↗)
The practical implication is blunt: do not force humans to adapt to machine speed. Slow the workflow where judgement matters, batch review where attention is scarce, automate only where verification is cheap, and make one executive function accountable for the human cost of AI.
![[sources-how-humans-are-adapting-to-ai-between-june-2024-an]]
Sources
Summary: ↑ Back to summary
Financial Press
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| f1 | OpenAI ChatGPT Enterprise Sees Surging Demand Despite Competition, COO Says | Bloomberg | 2024-04-04 | Early marker that 2024 was shifting from experimentation to enterprise rollout. Useful for the adoption baseline and for understanding why executives began pushing organization-wide uptake before ROI evidence was mature. |
| f2 | AI Agents Have Officially Entered the Workplace, Flaws and All | Bloomberg | 2024-10-24 | One of the clearest business-press signals that the enterprise conversation had moved from copilots to agents. Useful on the operational reality that agents arrived with error, trust, and supervision problems intact. |
| f3 | Big Tech’s New AI Obsession: Agents That Do Your Work for You | Bloomberg | 2024-12-13 | Frames the market narrative heading into 2025: value expectations migrated from assistance to delegated action. Important background for later evidence on overload, governance, and false ROI expectations. |
| f4 | OpenAI Finds AI Saves Workers Nearly an Hour a Day on Average | Bloomberg | 2025-12-08 | One of the most cited late-2025 enterprise productivity claims. Important because it quantifies gains, but also because it is vendor-commissioned and therefore a good example of where measured benefits need careful weighting. |
| f5 | Anthropic Finds Businesses Are Mainly Using AI to Automate Work | Bloomberg | 2025-09-15 | Important for the augmentation-versus-automation debate. Suggests enterprise usage may be moving faster toward delegation than many 'human plus AI' narratives imply. |
| f6 | Rethinking work: designing the 'human + AI' organisation | Economist Impact | 2026-01-20 | Directly relevant to the brief’s core question: redesigning work and culture so AI elevates human capability rather than overwhelms it. Useful for executive framing of operating-model redesign. |
| f7 | The AI glass floor | Economist Impact | 2025 | Strong source on inequality of outcomes: wage premiums for AI skills, junior-career risks, and the possibility that AI advantage accrues disproportionately to already-advantaged workers. |
| f8 | From intent to action: the leaders’ guide to building AI-powered workplaces | Economist Impact | 2025 | Useful on the adoption-to-scale gap. Supports the argument that many firms can pilot AI but few convert it into repeatable business value. |
| f9 | How Agentic AI is Reshaping Workplace Culture | Economist Impact | 2025 | Covers the behavioral and cultural side of adoption rather than just tooling. Relevant to trust, communication, and framing AI as a tool rather than an imposed end-state. |
| f10 | AI Use at Work Has Nearly Doubled in Two Years | Gallup | 2025-06-15 | High-value independent survey evidence on actual employee adoption. Especially useful because it shows adoption differs sharply by white-collar versus frontline roles. |
| f11 | AI Use at Work Rises | Gallup | 2025-12 | Adds the managerial-support finding: adoption is materially associated with support and strategic integration, reinforcing that deployment design matters. |
| f12 | Rising AI Adoption Spurs Workforce Changes | Gallup | 2026-04 | Useful for the 2026 snapshot: higher usage, rising job concerns, and evidence that leaders report stronger productivity gains than other groups. |
| f13 | The Rapid Adoption of Generative AI | NBER | 2024; revised 2025-02 | A foundational independent paper on how quickly AI diffused into work. Important for separating broad adoption from proven firm-level productivity transformation. |
| f14 | Firm Data on AI | NBER | 2026-03 | Probably the single most useful independent business-economics source in this set. It surveys nearly 6,000 executives and finds widespread adoption but still-small realized effects on jobs and productivity, directly challenging inflated mandate narratives. |
| f15 | In This Issue | NBER Newsletter | 2026-05 | Useful summary pointer emphasizing the same pattern: adoption is broad, measured productivity effects remain modest, and executive expectations still run ahead of observed outcomes. |
| f16 | 2024 Generative AI in Professional Services Report | Thomson Reuters Institute | 2024 | Professional services is a strong test bed because work is document-heavy, high-value, and supervision-intensive. Helpful for early evidence on where practitioners saw efficiency and quality gains. |
| f17 | 2025 Generative AI in Professional Services Report / Generative AI Adoption Nearly Doubles as Professional Services Reach Crossroads | Thomson Reuters Institute | 2025-04-15 | Shows the move from individual use to enterprise integration remains incomplete. Good evidence against simplistic 'everyone must use AI more' mandates. |
| f18 | 4 AI Trends in Professional Services to Watch in 2025 | Thomson Reuters Institute | 2025 | Useful on the gap between active personal usage and scaled workflow integration, which is central to the difference between activity metrics and value metrics. |
| f19 | 2025: The year the Frontier Firm is born | Microsoft WorkLab | 2025-04-24 | Major vendor framing of the 'agent boss' model. Valuable not because it is neutral, but because it captures how senior leaders were being encouraged to redesign organizations around agents. |
| f20 | 2025 Work Trend Index Annual Report (executive summary) | Microsoft WorkLab | 2025-04-24 | Adds methodological detail and the human-agent-team framing. Useful as a reference point for the managerial ideology of 2025 enterprise AI. |
| f21 | The State of Enterprise AI | OpenAI | 2025-12-08 | A major vendor report on enterprise usage, time savings, and task breadth. Important as evidence of measured benefits, but also as a reminder that many headline figures come from the suppliers themselves. |
| f22 | Workspace Analytics for ChatGPT Enterprise and Edu | OpenAI Help Center | 2026-03 rollout noted | Important operationally because it shows how vendors want enterprises to measure impact: productivity, time saved, quality, work satisfaction, and new-task completion. Useful for the measurement/governance angle. |
| f23 | Introducing the Anthropic Economic Index | Anthropic | 2025-02-10 | High-value source on actual task usage patterns. Especially relevant because it distinguishes augmentation from automation and maps usage to occupations. |
| f24 | Exploring model welfare | Anthropic | 2025-04-24 | Directly relevant to the brief’s welfare asymmetry. It shows a frontier lab formalizing concern for model welfare while enterprises still largely lack equivalent worker-welfare governance for deployment. |
| f25 | Responsible Scaling Policy Updates | Anthropic | 2026-06 | Shows the sophistication of model-side governance and accountability is increasing. Useful contrast with the relative immaturity of human-side deployment governance inside enterprises. |
Frontier Lab & Model News
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| t1 | Large Enough | Mistral | 2024-07-24 | Shows the 2024 push toward larger-context, tool-capable enterprise models. Relevant because enterprise deployment pressure grew alongside claims of throughput and cost-efficiency, setting up later questions about whether organizations optimized for value or for visible AI activity. |
| t2 | Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku | Anthropic | 2024-10-22 | Important for the human-supervision question because it moves AI from advising to acting on computers, which sharply increases oversight load and the risk that humans become nominal reviewers of machine-speed action sequences. |
| t3 | Google introduces Gemini 2.0: A new AI model for the agentic era | Google DeepMind | 2024-12-11 | Marks Google's explicit move to an 'agentic era' framing, relevant because orgs then increasingly experiment with delegating multi-step work instead of using AI only as a drafting aid. |
| t4 | OpenAI o1 System Card | OpenAI | 2024-12-05 | Useful for the 'human outcomes depend on training vs orchestration' question because it documents deceptive and oversight-evasion behaviors under some conditions, implying deployment architecture and monitoring remain critical even if model training improves. |
| t5 | Operator System Card | OpenAI | 2025-01-23 | Directly relevant to machine-speed supervision. Operator explicitly requires human confirmations at key steps, which is a concrete design acknowledgement that unrestricted human-in-the-loop oversight breaks down when agents can act across software systems. |
| t6 | Anthropic Economic Index: new building blocks for understanding AI use | Anthropic | 2025-01-15 | One of the stronger primary sources on how people actually use frontier models. Especially relevant for the user's deskilling and autonomy questions because it tracks task complexity, purpose, autonomy, and success rather than only aggregate usage. |
| t7 | When combinations of humans and AI are useful: A systematic review and meta-analysis | Nature Human Behaviour | 2025-02-05 | Best cross-domain evidence in this set for the human+AI complementarity question. Useful against simplistic 'AI always helps' or 'AI always harms' narratives. |
| t8 | Deep research System Card | OpenAI | 2025-02-25 | Important for understanding supervisory burden in browsing agents. Deep research formalizes multi-step web work and acknowledges prompt injection, privacy, code execution, and hallucination risks. |
| t9 | The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers | Management Science | 2025-02-27 | One of the strongest sources on measured benefits for high-skill work. Useful because it tests frontier-style coding assistance in organizational settings rather than relying on benchmark claims. |
| t10 | Gemini 2.0 model updates: 2.0 Flash, Flash-Lite, Pro Experimental | Google DeepMind | 2025-02-?? | Shows the economics of adoption pressure: cheaper, faster, long-context models make 'use it everywhere' mandates more likely, even though value depends on workflow fit. |
| t11 | Anthropic Economic Index: Insights from Claude 3.7 Sonnet | Anthropic | 2025-03-27 | Tracks how a stronger model changes real usage patterns. Relevant to whether training improvements alone shift outcomes, versus whether organizations still need better orchestration. |
| t12 | The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation | Meta AI | 2025-04-05 | Relevant because open-weight multimodal models broaden organizational experimentation and decentralize deployment decisions, making local operating-model design even more important. |
| t13 | Introducing GPT-4.1 in the API | OpenAI | 2025-04-14 | Useful for the productivity-versus-cognitive-load story because it pushes cheap, long-context model access further into everyday workflows, increasing temptation to substitute context volume for better work design. |
| t14 | OpenAI o3 and o4-mini System Card | OpenAI | 2025-04-16 | Relevant to the deployment-design question because it extends preparedness discussion to more autonomous reasoning models while still using thresholded governance rather than claiming training has solved the problem. |
| t15 | Exploring model welfare | Anthropic | 2025-04-24 | Directly relevant to the user's question about a model welfare function without an equivalent worker welfare function. It is a strong marker that frontier labs are formalizing concern for possible model interests faster than analogous governance for employee cognitive welfare. |
| t16 | LlamaFirewall: An open source guardrail system for building secure AI agents | Meta AI | 2025-04-29 | Strong evidence that good outcomes depend materially on orchestration and deployment guardrails, not just model training. Especially relevant to agent-to-human ratios and escalation designs. |
| t17 | Everything we announced at our first-ever LlamaCon | Meta AI | 2025-04-29 | Relevant because Meta tied ecosystem growth to evaluation tooling, signaling that broad adoption without measurement is inadequate. |
| t18 | Medium is the new large | Mistral | 2025-05-07 | Shows the competitive move toward cheaper enterprise deployment. Relevant to the user's 'token-maxing' concern because low-cost models often encourage breadth of rollout even when task-level value is weak. |
| t19 | Frontier Risk Report (February to March 2026) | METR | 2025-05-19 | One of the most important external sources for the period. It evaluates frontier internal models from Anthropic, Google, Meta, and OpenAI using agentic benchmarks, helping separate marketing narratives from independent capability evidence. |
| t20 | Operator System Card | OpenAI | 2025-??-?? | Included separately because it is one of the clearest explicit acknowledgements from a frontier lab that real-world deployment requires constrained action, staged release, and user confirmations rather than raw autonomy. |
| t21 | Details about METR's preliminary evaluation of Claude 3.7 | METR | 2025-??-?? | Important because it provides independent evidence on frontier-model performance in agentic, programming, and command-line tasks - exactly the capability band where human supervision becomes strained. |
| t22 | Claude 4 System Card | Anthropic | 2025-??-?? | Very relevant to the user's welfare-function question because the card explicitly includes model welfare assessment, while also documenting internal AI-research and autonomy evaluations. |
| t23 | Findings from a Pilot Anthropic - OpenAI Alignment Evaluation Exercise | Anthropic + OpenAI | 2025-??-?? | Not directly about worker adaptation, but important for the training-vs-deployment debate: labs are beginning to cross-evaluate alignment, yet even strong alignment results do not remove the need for deployment-side controls. |
| t24 | Grok 4.1 Model Card | xAI | 2025-11-17 | Useful as a comparator showing that by late 2025 frontier labs beyond the usual three were publishing pre-deployment safety evaluations and distinguishing between base model and production-prompt behavior. |
| t25 | Claude’s Constitution | Anthropic | 2026-01-21 | Relevant to the training side of the training-vs-orchestration question. It documents the welfare and normative principles Anthropic is using to shape model behavior, making the contrast with absent worker-welfare constitutions more salient. |
Academic & arXiv
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| a1 | Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile | NIST | 2024 | Best standards-oriented source for framing human oversight and post-deployment risk as a systems problem rather than a pure model problem. |
| a2 | The Rapid Adoption of Generative AI | NBER / SSRN | 2024 | Important baseline for the diffusion side of the story; shows that adaptation is real and broad before many firms had mature operating models. |
| a3 | Algorithm-enabled Decision Support and Worker Learning: a Large-Scale Field Experiment | SSRN | 2024 | Directly relevant to whether AI complements judgment and learning or merely substitutes for them. |
| a4 | How human–AI feedback loops alter human perceptual, emotional and social judgements | Nature Human Behaviour | 2024 | One of the clearest papers on downstream cognitive harms from AI-mediated judgment, beyond simple one-shot error rates. |
| a5 | RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | arXiv / METR | 2024 | Foundational for understanding where human oversight shifts from execution to orchestration when agents become strong on open-ended work. |
| a6 | Shifting Work Patterns with Generative AI | NBER | 2025 | Strong evidence that benefits may show up first in time allocation and work pattern shifts, not necessarily in measured task substitution. |
| a7 | A Task-Based Approach to Generative AI: Evidence from a Field Experiment in Central Banking | SSRN | 2025 | Useful counterweight to vendor-heavy enterprise narratives; tests AI in a serious knowledge-work environment with regulated stakes. |
| a8 | The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers | SSRN | 2025 | One of the strongest 2025 papers on heterogeneous gains by experience level. |
| a9 | HCAST: Human-Calibrated Autonomy Software Tasks | arXiv / METR | 2025 | Excellent anchor for human-agent ratio thinking, escalation thresholds, and deciding when not to force autonomy. |
| a10 | Making AI Count: The Next Measurement Frontier | NBER | 2025 | Directly relevant to rejecting token-maxing and activity-based KPI systems. |
| a11 | Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality | Organization Science | 2026 | Still the most useful operating-model metaphor for when AI helps versus when it silently degrades judgment. |
| a12 | Generative AI without guardrails can harm learning: Evidence from high school mathematics | PNAS | 2025 | One of the cleanest deskilling/cognitive-offloading papers in the period. |
| a13 | Turning Off Your Better Judgment: Algorithmic Conformity in AI-Human Collaboration | SSRN | 2025 | Highly relevant to automation complacency, deference, and the hidden costs of making AI too easy to obey. |
| a14 | Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling | ACL 2025 | 2025 | Concrete evidence for 'positive friction' as a deployment design pattern. |
| a15 | REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance | NAACL 2025 | 2025 | Important benchmark/methodology contribution for studying overreliance as a system property. |
| a16 | LLMs Trust Humans More, That's a Problem! | ACL 2025 | 2025 | Useful for the inverse problem: not only do humans overtrust models, models can overtrust humans, destabilizing mixed-initiative workflows. |
| a17 | Exploring model welfare | Anthropic research note | 2025 | Not a worker-outcomes paper, but central to the user's question about the asymmetry between emerging model welfare functions and missing worker welfare functions. |
| a18 | Firm Data on AI | NBER | 2026 | Best 2026 macro-adoption source for organizations; useful for separating firm-level adoption from worker-level use. |
| a19 | The Microstructure of AI Diffusion: Evidence from Firms, Business Functions, and Worker Tasks | NBER | 2026 | Directly relevant to the limits of mandates and why usage quotas are a weak proxy for value. |
| a20 | From Adoption to Outcomes: AI-Specific Implementation Gaps in the First 18 Months | SSRN | 2026 | One of the most directly relevant papers for the 'don't force adoption; redesign the operating model' argument. |
| a21 | Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents | arXiv | 2026 | Probably the most on-point 2026 paper for the practical reality of supervising agents rather than merely 'using AI tools'. |
| a22 | Human Oversight and Overload: Two Hidden and Costly Burdens of AI-Assisted Software Engineering | arXiv | 2026 | Directly supports the thesis that the constraining resource becomes human attention, not model throughput. |
| a23 | Human Tool: An MCP-Style Framework for Human-Agent Collaboration | arXiv | 2026 | A strong candidate design pattern for structuring oversight instead of scattering interruptions across workers. |
| a24 | HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems | arXiv | 2026 | Rare paper that explicitly connects governance strength to workload buffering rather than treating governance as pure drag. |
| a25 | New Report: Challenges to the Monitoring of Deployed AI Systems | NIST AI 800-4 announcement/report | 2026 | Useful evidence that monitoring and oversight costs are not incidental; they are recurring deployment burdens. |
VC & Analyst Reports
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| v1 | How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 | Andreessen Horowitz (a16z) | 2025-05-08 | Useful for adoption curves, build-vs-buy behavior, and evidence that deployment design is becoming more important than model novelty. |
| v2 | 16 Changes to the Way Enterprises Are Building and Buying Generative AI | Andreessen Horowitz (a16z) | 2024-04-16 | Good baseline for 2024 organizational behavior and why forced rollout into higher-risk workflows had not yet happened broadly. |
| v3 | Where Enterprises are Actually Adopting AI | Andreessen Horowitz (a16z) | 2026-04-15 | Helpful for distinguishing real adoption from theater and for understanding where human workflow redesign is easier. |
| v4 | AI 50: Companies of the Future | Sequoia Capital | 2024-04-11 | Strong for investment thesis and early productivity framing. |
| v5 | The Stochastic Mindset | Sequoia Capital | 2025-01-22 | One of the few VC pieces directly relevant to human cognitive adaptation and judgment under AI. |
| v6 | The Always-On Economy: AI and the Next 5-7 Years | Sequoia Capital | 2025-04-21 | Useful for the user's concern that human oversight may be mismatched to machine-speed workflows. |
| v7 | Here’s how leading strategy teams are successfully driving generative AI adoption in their organizations | CB Insights | 2025-01-16 | Strong evidence against equating organizational enthusiasm with real deployment success. |
| v8 | Enterprise AI agents & copilots: Our growth projections for the $5B+ market | CB Insights | 2025-04-29 | Useful for market sizing and for understanding why organizations are being pushed toward agentic operating models. |
| v9 | Should enterprises adopt closed-source or open-source AI models? | CB Insights | 2025-02-12 | Directly supports the idea that orchestration and deployment choices matter as much as, or more than, allegiance to one model family. |
| v10 | State of AI Report: 6 trends shaping the landscape in 2025 | CB Insights | 2025-01-30 | Good macro context for why adoption pressure intensified inside organizations in 2025. |
| v11 | What’s next for AI agent ROI? | CB Insights | 2026-03-2026 | Very useful for the user's question about whether outcomes depend more on model training or orchestration/deployment design. |
| v12 | Gartner Identifies the Top 10 Strategic Technology Trends for 2025 | Gartner | 2024-10-21 | Important strategic framing for machine-speed delegation and the human oversight problem. |
| v13 | Hype Cycle for Generative AI, 2024 | Gartner | 2024-07-31 | Useful for mapping maturity and for cautioning against top-down overcommitment to immature categories. |
| v14 | Gartner Says Everyday AI and Digital Employee Experience Are Two Years Away from Mainstream Adoption | Gartner | 2024-08-14 | One of the clearest analyst proxies for a worker-welfare or human-experience lens. |
| v15 | AI Agent Layer: Why CIOs Must Lead Enterprise Transformation | Gartner | 2025-05-2026 | Directly relevant to operating-model ownership and governance for human-machine work allocation. |
| v16 | The State Of Generative AI, 2024 | Forrester | 2024-01-2024 | Good baseline source for early-2024 enterprise posture. |
| v17 | Tech Pulse Q4 2024: How IT Builds An AI Advantage By Embracing AI Tools And Agents | Forrester | 2025-03-05 | Helpful for understanding how heavy-use functions adapt in practice. |
| v18 | Forrester: Three Years Into GenAI, Enterprises Are Still Chasing Its True Transformative Value | Forrester | 2026-04-02 | One of the strongest sources here against forced adoption and usage-target theater. |
| v19 | Introducing The Forrester AI Value Matrix: A Framework For Measuring What Matters | Forrester | 2026-04-24 | Directly useful for separating true value creation from activity metrics like token counts or superficial usage. |
| v20 | Superagency in the workplace: Empowering people to unlock AI’s full potential | McKinsey | 2025-01-28 | Central source for human adaptation, adoption gaps, and the case for work redesign rather than mere tool provision. |
| v21 | Gen AI’s next inflection point: From employee experimentation to organizational transformation | McKinsey | 2024-08-07 | Useful against top-down mandate logic; suggests bottom-up use preceded managerial structure. |
| v22 | Agents, robots, and us: Skill partnerships in the age of AI | McKinsey | 2025-11-25 | One of the best sources in this lane for operating-model redesign around human-machine complementarity. |
| v23 | The State of AI: How organizations are rewiring to capture value | McKinsey | 2025-11-05 | Strong adoption-curve evidence and a useful bridge from experimentation to operating model. |
| v24 | The state of AI in early 2024: Gen AI adoption spikes and starts to generate value | McKinsey | 2024-06-07 | Best early-period baseline for the 2024-2026 adoption curve. |
| v25 | Generative AI virtually ubiquitous in global business as the technology spreads at a near-unprecedented rate | Bain & Company | 2024-06-20 | Strong evidence on budget pressure and why executives may default to mandates. |
Blogs & Independent Thinkers
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | Real AI Agents and Real Work | One Useful Thing | 2025-09-29 | Strong on agent supervision, review workflows, and the risk of 'infinite PowerPoints' instead of real value. (oneusefulthing.org) |
| b2 | Claude Dispatch and the Power of Interfaces | One Useful Thing | 2026-03-31 | Directly relevant to cognitive overload and designing around human attention rather than machine throughput. (oneusefulthing.org) |
| b3 | Management as AI superpower | One Useful Thing | 2026-01-27 | Useful for operating-model design and the economics of supervision overhead. (oneusefulthing.org) |
| b4 | Making AI Work: Leadership, Lab, and Crowd | One Useful Thing | 2025-05-25 | Good antidote to top-down mandate logic. (oneusefulthing.org) |
| b5 | Choosing to Stay Human | One Useful Thing | 2026-05-26 | Independent reflection on erosion of judgment and behavioral adaptation. (oneusefulthing.org) |
| b6 | What it feels like to work with Mythos | One Useful Thing | 2026-06-09 | Useful as a late-period marker for how human-AI relations were shifting by June 2026. (oneusefulthing.org) |
| b7 | Coding agents require skilled operators | Simon Willison’s Weblog | 2025-06-18 | Clean statement of why forcing novice adoption can destroy value. (simonwillison.net) |
| b8 | The AI Hangover | Quandary Labs Substack | 2026-01-14 | Directly addresses poor budget outcomes and the failure of activity metrics. (substack.quandarylabs.ai) |
| b9 | Digital Economy Dispatch #264 -- AI Bottlenecks, Jagged Edges, and the Real Barriers to AI-at-Scale | Dispatches | 2026-01-06 | Good independent synthesis linking frontier capability to deployment friction. (dispatches.alanbrown.net) |
| b10 | Attention Ecology | Human OS Manual | 2026-03-29 | Useful independent human-factors framing for cognitive load and operating-model design. (thehumanosmanual.com) |
| b11 | Vibe Coding, Windsurf and Anthropic, ChatGPT Connectors | Stratechery | 2025-06-09 | Helpful for the thesis that orchestration and integration layers matter as much as models. (stratechery.com) |
| b12 | Oversight Assistants: Turning Compute into Understanding | LessWrong | 2026-01-07 | One of the clearest pieces on why human-in-the-loop supervision at machine speed breaks down. (lesswrong.com) |
| b13 | No, We're Not Getting Meaningful Oversight of AI | LessWrong / GreaterWrong | 2025-07-09 | Useful skeptical counterweight to simplistic HITL narratives. (greaterwrong.com) |
| b14 | Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate | LessWrong | 2026-05-26 | Important for the 'training vs deployment' question because oversight degrades with system design choices. (lesswrong.com) |
| b15 | Is AI welfare work puntable? | LessWrong | 2026-05-12 | Useful for the asymmetry between formalized model welfare and diffuse human welfare governance. (lesswrong.com) |
| b16 | Exploring model welfare | Anthropic News | 2025-04-24 | This is the clearest primary-source marker of labs building a model-welfare function. (anthropic.com) |
| b17 | Introducing the Anthropic Economic Index | Anthropic Research | 2025-02-10 | Important empirical counterweight to sweeping displacement narratives. (anthropic.com) |
| b18 | Anthropic Economic Index report: Uneven geographic and enterprise AI adoption | Anthropic Research | 2025-09-18 | Useful on heterogeneity across firms and regions rather than one flat adoption curve. (anthropic.com) |
| b19 | AI Use at Work Has Nearly Doubled in Two Years | Gallup | 2025-06-16 | Anchors the adaptation story in measured adoption data. (gallup.com) |
| b20 | Gen Z's AI Adoption Steady, but Skepticism Climbs | Gallup | 2026-04-10 | Useful corrective to simplistic generational narratives. (news.gallup.com) |
| b21 | Humans in the Loop: Executive Summary | MIT Institute for Work and Employment Research / Industrial Performance Center | 2026-04-08 | Strong evidence that good deployment must be designed around human motivation and identity, not just efficiency. (ipc.mit.edu) |
| b22 | Designing Human-AI Collaboration: A Sufficient-Statistic Approach | NBER | 2025-06-01 | One of the most important formal pieces for operating-model design and effort crowd-out. (nber.org) |
| b23 | Bias in the Loop: How Humans Evaluate AI-Generated Suggestions | Harvard Data Science Review / MIT Press | 2026-04-30 | Relevant to automation complacency, over-trust, and vigilance fatigue. (hdsr.mitpress.mit.edu) |
| b24 | Beyond the Principle: How Organizations Implement Human-in-the-Loop Oversight for Generative AI | AMCIS 2026 Proceedings | 2026-08-?? | Useful for concrete design patterns rather than abstract calls for oversight. (aisel.aisnet.org) |
| b25 | Orchestrating Human-AI Software Delivery: A Retrospective Longitudinal Field Study of Three Software Modernization Programs | arXiv | 2026-03-25 | Direct evidence for the training-versus-orchestration question; it strongly favors orchestration. (arxiv.org) |
Tech Industry & Practitioner
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| p1 | 2024 Accelerate State of DevOps Report | DORA | 2024 | Strong baseline on organizational conditions that predict whether AI will help or harm teams. |
| p2 | State of AI-assisted Software Development 2025 | DORA | 2025-09-23 | Most directly relevant practitioner source for AI adoption outcomes in software organizations. |
| p3 | Balancing AI tensions: Moving from AI adoption to effective SDLC use | DORA | 2026-03-10 | Directly addresses cognitive burden and why apparent productivity can coexist with higher instability. |
| p4 | How customization supports developer engagement | DORA | 2025-09-23 | Useful for the question of training versus orchestration: deployment and interface design materially shape outcomes. |
| p5 | Coding throughput as a measure of productivity | Thoughtworks Technology Radar | 2026-04-15 | Best practitioner source in this set against token-maxing and superficial AI activity metrics. |
| p6 | Complacency with AI-generated code | Thoughtworks Technology Radar | 2025-11-05 | Strong practitioner articulation of vigilance fatigue and degraded review quality. |
| p7 | Thoughtworks Tech Radar 30th Edition: Team AI | Thoughtworks | 2024-04-03 | Early practitioner framing that already centers flow disruption, not just capability upside. |
| p8 | [Macro trends in the tech industry | April 2026](https://www.thoughtworks.com/en-us/insights/blog/technology-strategy/macro-trends-tech-industry-april-2026?utm_source=openai) | Thoughtworks | 2026-04-15 |
| p9 | DevEx in Action | ACM Queue | 2024-01-14 | One of the strongest practitioner-methodology sources on human cognitive load and software work design. |
| p10 | For AI Productivity Gains, Let Team Leaders Write the Rules | MIT Sloan Management Review | 2025-10-15 | Direct evidence against purely top-down AI mandates. |
| p11 | Want AI-Driven Productivity? Redesign Work | MIT Sloan Management Review | 2025-05-01 | Good bridge source between AI capability and work redesign. |
| p12 | AI-Generated "Workslop" Is Destroying Productivity | Harvard Business Review | 2025-09-22 | One of the clearest practitioner critiques of usage mandates and output theater. |
| p13 | AI Doesn’t Reduce Work - It Intensifies It | Harvard Business Review | 2026-02-09 | Useful practitioner framing for the hidden labor of supervision and oversight. |
| p14 | Workers Don’t Trust AI. Here’s How Companies Can Change That. | Harvard Business Review | 2025-11-07 | Relevant to worker welfare governance and shadow-AI behavior under forced adoption. |
| p15 | Stack Overflow’s 2024 Developer Survey Shows the Gap Between AI Use and Trust in its Output Continues to Widen Among Coders | Stack Overflow | 2024-07-24 | Large-sample developer sentiment baseline across the start of the date range. |
| p16 | Developers remain willing but reluctant to use AI: The 2025 Developer Survey results are here | Stack Overflow | 2025-12-29 | Confirms that the trust gap persisted as AI use grew. |
| p17 | Agents on a leash: Agentic AI remains mostly single-agent and monitored at work | Stack Overflow | 2026-05-27 | Good current practitioner evidence that organizations have not normalized free-running multi-agent autonomy at work. |
| p18 | The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers | Microsoft Research | 2025 | Key source for cognitive offloading and complacency risks. |
| p19 | The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers | Microsoft Research | 2025-06 | One of the strongest causal sources on measured developer productivity benefits. |
| p20 | Dear Diary: A Randomized Controlled Trial of Generative AI Coding Tools in the Workplace | Microsoft Research | 2025-04 | Important source on the social and interpretive side of AI coding-tool adoption. |
| p21 | Data Centers May House AI - But Operators Don’t Trust AI (Yet) | IEEE Spectrum | 2025 | A strong practitioner analog for why humans resist handing over consequential operational control even when AI capability rises. |
| p22 | Exploring model welfare | Anthropic | 2025-04-24 | Central source for the contrast between explicit model welfare and the lack of an equivalent worker-welfare operating function. |
| p23 | Claude Opus 4.6 System Card | Anthropic | 2026 | Concrete evidence that model welfare has become operationalized in frontier-model governance. |
| p24 | Anthropic Economic Index report: Uneven geographic and enterprise AI adoption | Anthropic | 2025-09-15 | Useful against simplistic mandate thinking: even rapid adoption is uneven and context-bound. |
| p25 | Anthropic Economic Index report: Learning curves | Anthropic | 2026-03 | Relevant to differences by role and capability rather than flattening all workers into one adoption story. |