AI & Skill Formation — Shen & Tamkin (2026)

Based on Shen & Tamkin (2026), Figure 6. Error bars represent 95% CI. n = 52 (26 per group). * p < 0.05. Cohen's d = 0.738 for quiz score difference.

Based on Shen & Tamkin (2026), Figure 8. Debugging questions showed the largest gap between groups — the control group's hands-on error encounters built exactly the skills the AI group bypassed.

What the experiment did — in plain English

Shen and Tamkin recruited 52 working software developers and asked them to learn a Python library they had never used before — Trio, a framework for asynchronous programming. Half the participants could chat with an AI coding assistant (GPT-4o) while working through the tasks; the other half had only documentation and web search. Afterwards, everyone sat a comprehension quiz — without AI. The quiz tested three skill areas that matter most for supervising AI-generated code: conceptual understanding, code reading, and debugging.

The AI group didn't finish notably faster — some participants spent up to 11 minutes just composing queries — but they scored significantly worse on the quiz. The effect was sharpest in debugging, which is precisely the skill humans need to catch mistakes in code they didn't write themselves. The control group, by contrast, encountered roughly three times as many errors during the task. Wrestling with those errors — and resolving them independently — appears to be the mechanism that built durable understanding.

Not all AI use is equal — six patterns, two outcomes

The qualitative analysis revealed something more nuanced than a blanket "AI is bad for learning." By watching every participant's screen recording, the researchers identified six distinct ways people interacted with the AI assistant. These clustered into two groups: three high-scoring patterns where participants stayed cognitively engaged and preserved learning, and three low-scoring patterns where participants offloaded thinking to the AI and learned much less.

Why this matters — the supervision paradox

The finding creates an uncomfortable loop. As AI becomes more capable, organisations hand it more code-writing responsibility. But the humans who are supposed to review that code are the same people who learned their craft alongside AI — and who may never have developed the debugging instincts that come from encountering and resolving errors independently. The researchers call this the core tension: AI-enhanced productivity is not a shortcut to competence. In safety-critical domains — medical software, autonomous vehicles, financial systems — this gap between apparent productivity and actual understanding is where failures hide.

Crucially, the paper notes that its experimental setup — a chat-based assistant where users must compose queries — represents a lower bound on the problem. Agentic coding tools that generate entire solutions with minimal prompting would likely produce even larger skill-formation deficits. The time participants spent composing queries was itself a form of cognitive engagement; remove that friction and the learning gap would widen further.

The good news buried in the data is that AI use and learning are not inherently opposed. Three of the six interaction patterns preserved skill formation — the ones where participants stayed cognitively engaged by asking conceptual questions, requesting explanations, or following up on generated code. The implication for tool design and workplace practice is clear: the goal is not to ban AI assistance, but to build workflows that keep the human's brain in the loop.

Reference

Shen, J. H., & Tamkin, A. (2026). How AI impacts skill formation (arXiv:2601.20245). arXiv. https://arxiv.org/abs/2601.20245

AI makes novice developers faster — but quietly stops them from learning