Explainer Collection

AI Engineering

Shen & Tamkin (2026)

AI makes novice developers faster - but quietly stops them from learning

In a randomised experiment, software engineers who used AI to complete unfamiliar coding tasks scored 17% lower on a comprehension test - without finishing meaningfully faster. The biggest gap was in debugging: the exact skill you need to supervise AI-generated code.

−17%: Quiz score drop for the AI-assisted group
52: Developers in the randomised trial

Farach (2026)

AI doesn't just change what workers do - it changes how many a manager can lead

Every extra worker a manager oversees adds coordination overhead: briefings, check-ins, misunderstandings. AI tools that reduce that friction let managers run bigger teams, flattening the whole hierarchy in the process.

3.3: Workers per manager before any AI
20×: Team size when AI capability reaches mid-range

Chen et al. (2026)

AI coding agents write tests compulsively - but the tests barely matter

Across six frontier models on SWE-bench Verified, agent-written tests are widespread but provide almost no leverage on task outcomes. GPT-5.2 writes virtually zero tests yet resolves issues at nearly the same rate as test-heavy models. Suppressing tests slashes costs by up to 49% while losing only 1.8 - 2.6% of successes.

83%: Outcomes unchanged when tests are added or removed
6: Frontier LLMs studied on SWE-bench Verified

Benkovich & Valkov (2026)

Give AI agents a team structure and they outperform solo agents - even with weaker models

Agyn organises four specialised AI agents into a software engineering team with a manager, researcher, engineer, and code reviewer - each with its own sandbox, tools, and model. On SWE-bench 500, this production-first system resolves 72.2% of issues, beating single-agent baselines on comparable models by 7.2 percentage points. No benchmark tuning required.

72.2%: SWE-bench 500 resolution rate
4: Specialised agent roles on the team

Liu (2026)

AI coding assistants fix more code smells than they create, but introduce nearly twice the security issues they resolve

Across 304,362 AI-authored commits from 6,275 GitHub repositories, AI tools are a net positive for surface-level code quality but a net negative for bugs and security vulnerabilities, with 24.2% of all introduced issues persisting indefinitely.

484,606: distinct quality issues identified across 304,362 AI-authored commits from 6,275 GitHub repositories
89.1%: of all AI-introduced issues are code smells, the dominant but least dangerous form of technical debt

Li (2025)

AI coding agents now ship 456,000 pull requests, but their code gets rejected far more often than human work

The first large-scale dataset of autonomous coding agent activity on GitHub reveals that speed and scale are real, but acceptance rates, review dynamics, and code complexity tell a more sobering story about the gap between benchmarks and production.

456K: pull requests authored by five autonomous coding agents across 61,000 repositories and 47,000 developers
35 - 65%: agent PR acceptance rates, compared to 77% for human developers, a 12 - 42 percentage point gap

Humberd (2026)

Agency theory was built for human managers, but AI is becoming the agent nobody knows how to supervise

A new framework maps five stages of AI evolution against traditional agency mechanisms, arguing that firms need to scaffold monitoring and incentive systems now, well before AI gains full decision-making autonomy.

Routine AI: mimics rote human decisions and stays fully under human control, like a reorder system that restocks at a fixed threshold
Machine AI: adapts human-set algorithms by pulling in structured external data such as sales forecasts to improve decision inputs

Steve Yegge (2026)

Yegge's Gas Town treats AI agents like a factory floor - and the developer becomes the operator

Gas Town is a multi-agent orchestrator that lets a single developer manage 20 - 30 parallel AI coding agents through a Mad Max - themed hierarchy of roles, turning software engineering into something closer to running a refinery than writing code.

Who built it: Steve Yegge - veteran engineer, ex-Google/Amazon, known for long-form tech essays
Why it exists: Running 20 - 30 Claude Code agents by hand is chaos - stuff gets lost, agents get stuck, merges collide

Treude & Storey (2025)

AI is rewriting software engineering research - and the old methods may no longer hold

A vision paper from Singapore Management University and the University of Victoria argues that LLMs don't just change how software is built - they undermine the constructs, methods, and validity standards that empirical SE researchers have relied on for decades.

4: Research pillars disrupted: phenomena, methods, data, and validity
8: SE artifact types now documented as AI co-generated