Research · Blogs & Independent Thinkers
Back to sweepResearch sweep · deep · 2025 – 2026
Agentic Engineering And Enterprise Architecture Discipline
Agentic engineering after Andrej Karpathy's vibe coding meme, April 2025-April 2026: how AI coding agents are changing enterprise software engineering across security, testability, reliability, maintainability, availability, resilience, observability, operability, cost, recovery, and engineering governance.
- frontier
- academic
- vc
- blogs
- tech
- financial
Synthesised 2026-04-30
Narrative
The strongest through-line in independent coverage is that the market has moved beyond the old “vibe coding” meme into a more serious discipline centered on harnesses, context, tests, and operational controls. Simon Willison and Addy Osmani are the clearest on the terminology shift: vibe coding means ignoring the code, while the professional mode requires specs, diff review, test suites, and explicit accountability. Thoughtworks’ Birgitta Böckeler adds the operational layer, arguing that context engineering and harness engineering are now the practical center of gravity for agentic work. Cloudflare’s April 2026 posts show the same pattern in enterprise infrastructure terms: durable execution, sandboxed code, identity-aware auth, and long-running sessions are becoming the substrate for production agents.
The second major theme is that agentic engineering increases the importance of classic engineering disciplines rather than replacing them. Security-focused work from Thoughtworks and OpenAI highlights new attack surfaces, prompt/context poisoning, and the need for monitoring and least privilege. LessWrong and METR-linked analysis push back on inflated benchmark claims, showing that agents often look better on algorithmic scores than on real code quality, maintainability, and usability. Across the sample, the credible claim is not that agents remove software engineering constraints, but that they make those constraints more visible and less optional.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | Not all AI-assisted programming is vibe coding (but vibe coding rocks) | Simon Willison's Weblog | 2025-03 | Defines vibe coding narrowly and argues for separating reckless prompt-only coding from disciplined AI-assisted engineering. |
| b2 | Two publishers and three authors fail to understand what “vibe coding” means | Simon Willison's Weblog | 2025-05 | Shows the term immediately being stretched beyond Karpathy’s original meaning, clarifying the vocabulary problem the lane is tracking. |
| b3 | Vibe engineering | Simon Willison's Weblog | 2025-10 | Introduces a disciplined middle ground between meme-driven vibe coding and production-grade engineering. |
| b4 | Claude Code for web—a new asynchronous coding agent from Anthropic | Simon Willison's Weblog | 2025-10 | Treats asynchronous coding agents as a distinct operational form factor, not just a better autocomplete. |
| b5 | Claude Code Can Debug Low-level Cryptography | Simon Willison's Weblog | 2025-11 | Provides a serious security-adjacent example where agents are useful as debugging assistants without being trusted to write final code. |
| b6 | mistralai/mistral-vibe | Simon Willison's Weblog | 2025-12 | Notes the emerging terminal-agent pattern and the consolidation of coding agents into a recognizable tooling category. |
| b7 | GLM-5: From Vibe Coding to Agentic Engineering | Simon Willison's Weblog | 2026-02 | Captures the shift in naming from vibe coding toward agentic engineering as the professional framing becomes clearer. |
| b8 | Linear walkthroughs | Simon Willison's Weblog | 2026-02 | Shows agents being used for codebase comprehension and recovery, not just generation. |
| b9 | Introducing Showboat and Rodney, so agents can demo what they’ve built | Simon Willison's Weblog | 2026-02 | Highlights the need for proof artifacts and manual verification when agents produce software. |
| b10 | Ladybird adopts Rust, with help from AI | Simon Willison's Weblog | 2026-02 | A strong case study for human-directed, high-rigor agent use on critical code with extensive tests. |
| b11 | Agentic Engineering | AddyOsmani.com | 2026-02 | Explicitly distinguishes vibe coding from production-grade agentic work and argues for specs, review, and testing. |
| b12 | Stop Using /init for AGENTS.md | AddyOsmani.com | 2026-02 | Argues that useful agent instructions must encode non-discoverable project knowledge, not boilerplate. |
| b13 | The Factory Model: How Coding Agents Changed Software Engineering | AddyOsmani.com | 2026-02 | Frames coding agents as a change in software production model while insisting engineering constraints still matter. |
| b14 | Scaffolding | AddyOsmani.com | 2026 | Makes the case that types, linting, tests, CI, and conventions are the trellis that keeps agent output on track. |
| b15 | Harness engineering for coding agent users | martinfowler.com / Thoughtworks | 2026-04 | One of the clearest pieces on feedforward controls, feedback sensors, behavior harnesses, and harnessability. |
| b16 | Context Engineering for Coding Agents | martinfowler.com / Thoughtworks | 2026-02 | Explains how context curation, rules, skills, and specs become core engineering inputs for coding agents. |
| b17 | Autonomous coding agents: A Codex example | martinfowler.com / Thoughtworks | 2025-06 | Separates supervised from autonomous coding agents and describes their operating model in practical terms. |
| b18 | Coding Assistants Threaten the Software Supply Chain | martinfowler.com / Thoughtworks | 2025-05 | A strong security-focused analysis of new attack surfaces introduced by agent loops, MCP, and rules files. |
| b19 | Building your own CLI Coding Agent with Pydantic-AI | martinfowler.com / Thoughtworks | 2025-08 | Shows why teams may need custom agents tuned to their testing, documentation, and file-system standards. |
| b20 | Exploring Generative AI | martinfowler.com / Thoughtworks | 2025-07 | A useful hub page for a run of practical memos on how AI is changing software delivery practice. |
| b21 | AI Agent Benchmarks Are Broken | LessWrong | 2025-07 | Argues that benchmark design can overstate agent capability by large margins, which matters for enterprise claims. |
| b22 | METR Research Update: Algorithmic vs. Holistic Evaluation | LessWrong | 2025-08 | Shows that agents can look good under algorithmic scoring while failing on real-world code quality and usability. |
| b23 | OpenAI: How we monitor internal coding agents for misalignment | LessWrong | 2026-03 | Surfaces concrete monitoring practices and misalignment failure modes from real internal coding-agent deployments. |
| b24 | Dynamic, identity-aware, and secure Sandbox auth | Cloudflare Blog | 2026-04 | Explains sandboxed execution and identity-aware auth as core infrastructure for untrusted agent workloads. |
| b25 | Project Think: building the next generation of AI agents on Cloudflare | Cloudflare Blog | 2026-04 | Describes durable execution, sub-agents, persistent sessions, and sandboxed code as the substrate for long-running agents. |