Research Explainer · Shen & Tamkin (2026)

AI makes novice developers faster — but quietly stops them from learning

In a randomised experiment, software engineers who used AI to complete unfamiliar coding tasks scored 17% lower on a comprehension test — without finishing meaningfully faster. The biggest gap was in debugging: the exact skill you need to supervise AI-generated code.

−17%
Quiz score drop for the
AI-assisted group
52
Developers in the
randomised trial
6
Distinct AI interaction
patterns observed

The core trade-off: speed vs. understanding

Task completion time (minutes)

AI didn't deliver a significant speed-up overall (p = 0.391)

Post-task quiz score (%)

AI users understood significantly less (p = 0.010*)

Experimental condition

AI assistance No AI (control)

Based on Shen & Tamkin (2026), Figure 6. Error bars represent 95% CI. n = 52 (26 per group). * p < 0.05. Cohen's d = 0.738 for quiz score difference.

Where understanding broke down most

Quiz score by skill area

AI assistance No AI (control)

Based on Shen & Tamkin (2026), Figure 8. Debugging questions showed the largest gap between groups — the control group's hands-on error encounters built exactly the skills the AI group bypassed.

What the experiment did — in plain English

Shen and Tamkin recruited 52 working software developers and asked them to learn a Python library they had never used before — Trio, a framework for asynchronous programming. Half the participants could chat with an AI coding assistant (GPT-4o) while working through the tasks; the other half had only documentation and web search. Afterwards, everyone sat a comprehension quiz — without AI. The quiz tested three skill areas that matter most for supervising AI-generated code: conceptual understanding, code reading, and debugging.

The AI group didn't finish notably faster — some participants spent up to 11 minutes just composing queries — but they scored significantly worse on the quiz. The effect was sharpest in debugging, which is precisely the skill humans need to catch mistakes in code they didn't write themselves. The control group, by contrast, encountered roughly three times as many errors during the task. Wrestling with those errors — and resolving them independently — appears to be the mechanism that built durable understanding.

Not all AI use is equal — six patterns, two outcomes

The qualitative analysis revealed something more nuanced than a blanket "AI is bad for learning." By watching every participant's screen recording, the researchers identified six distinct ways people interacted with the AI assistant. These clustered into two groups: three high-scoring patterns where participants stayed cognitively engaged and preserved learning, and three low-scoring patterns where participants offloaded thinking to the AI and learned much less.

▲ High-scoring patterns (65–86% quiz score)

Generation-Then-Comprehension

86% avg quiz · 24 min

Let AI generate code, then asked follow-up questions to understand why it worked. Looked like delegation on the surface — but the comprehension step made all the difference.

Hybrid Code-Explanation

68% avg quiz · 24 min

Asked the AI for code and explanations in the same query. Reading and understanding the explanations took more time but built real knowledge.

Conceptual Inquiry

65% avg quiz · 22 min

Only asked conceptual questions; wrote all code themselves and resolved errors independently. The fastest high-scoring pattern.

▼ Low-scoring patterns (24–39% quiz score)

AI Delegation

39% avg quiz · 19.5 min

Only asked AI to generate code, pasted it as the answer. Fastest completion — but the lowest understanding of what they'd built.

Progressive AI Reliance

35% avg quiz · 22 min

Started by asking questions for task 1, then gave up and delegated everything for task 2. Scored poorly because they never mastered the later concepts.

Iterative AI Debugging

24% avg quiz · 31 min

Repeatedly pasted errors into the AI for fixes (5–15 queries) without trying to understand the underlying cause. Slowest and lowest-scoring.

Why this matters — the supervision paradox

The finding creates an uncomfortable loop. As AI becomes more capable, organisations hand it more code-writing responsibility. But the humans who are supposed to review that code are the same people who learned their craft alongside AI — and who may never have developed the debugging instincts that come from encountering and resolving errors independently. The researchers call this the core tension: AI-enhanced productivity is not a shortcut to competence. In safety-critical domains — medical software, autonomous vehicles, financial systems — this gap between apparent productivity and actual understanding is where failures hide.

Crucially, the paper notes that its experimental setup — a chat-based assistant where users must compose queries — represents a lower bound on the problem. Agentic coding tools that generate entire solutions with minimal prompting would likely produce even larger skill-formation deficits. The time participants spent composing queries was itself a form of cognitive engagement; remove that friction and the learning gap would widen further.

The good news buried in the data is that AI use and learning are not inherently opposed. Three of the six interaction patterns preserved skill formation — the ones where participants stayed cognitively engaged by asking conceptual questions, requesting explanations, or following up on generated code. The implication for tool design and workplace practice is clear: the goal is not to ban AI assistance, but to build workflows that keep the human's brain in the loop.

Reference

Shen, J. H., & Tamkin, A. (2026). How AI impacts skill formation (arXiv:2601.20245). arXiv. https://arxiv.org/abs/2601.20245