Research Explainer · Vandeputte (2025)

Stop letting AI agents run everything; make them automate themselves out of the critical path

A Nokia Bell Labs framework argues that reliable GenAI systems should blend traditional software engineering with cognitive AI processing, keeping agents as occasional problem-solvers rather than permanent gatekeepers.

Key Contribution

This paper proposes a comprehensive architectural framework for "GenAI-native" systems, built on five design pillars (reliability, excellence, evolvability, self-reliance, and assurance) with 21 concrete design patterns. The central argument: roughly 95% of processing should run through fast, tested, traditional code, with GenAI agents reserved for the remaining edge cases, while continuously working to convert those edge cases into traditional code too.

The current wave of AI agent systems puts large language models at the centre of every decision. Every request flows through a model, every response gets generated from scratch, and every interaction relies on the same expensive, unpredictable cognitive processing. Vandeputte calls this the "generate and pray" strategy, and his paper is a sustained argument against it.

The core issue is threefold. GenAI is unreliable (it hallucinates, produces inconsistent outputs, and behaves unpredictably across prompt variations). It is inefficient (using an LLM to calculate 1+1 is absurd but structurally identical to what many agent systems do for routine tasks). And it is opaque (debugging a chain-of-thought reasoning failure in production is orders of magnitude harder than tracing a function call). The paper draws a pointed analogy: current agentic solutions resemble "artisanal methods reminiscent of the early pre-industrial era." The proposed alternative is industrialisation.

The framework organises everything around five design goals that determine how a GenAI-native system should be built. These are not novel in isolation (reliability and security are table stakes for any production system), but the paper redefines each through the lens of GenAI's particular failure modes. Reliability shifts from binary pass/fail to "utility-based sufficiency criteria," accepting that outputs may be imperfect but must be useful most of the time. Excellence demands that quality and efficiency be explicitly balanced, not assumed. Evolvability acknowledges that GenAI assets will change at runtime, so the system must be designed for controlled mutation. Self-reliance means agents should solve problems independently but within strict guardrails. Assurance covers security, alignment, and trust, recognising that every GenAI component should be treated as potentially compromised.

Reliability

Fault tolerance, resilience, robustness. Replace binary pass/fail with utility-based sufficiency. Outputs should be "sufficiently useful most of the time."

Excellence

Competency, precision, proficiency. Minimize cognitive processing on critical paths. Use proven SE practices like CI/CD, checklists, and Six Sigma reviews.

Evolvability

Adaptability, flexibility, malleability. Systems should learn from bespoke cognitive solutions and convert recurring ones into hardened traditional code.

Self-reliance

Self-sufficiency, self-governance, self-improvement. Agents operate independently but within clear policies, with rollback and fail-safe mechanisms.

Assurance

Alignment, security, trustworthiness. Assume every GenAI asset may be compromised. Implement cognitive firewalls, screening, and sandboxing.

The paper's most distinctive contribution is the GenAI-native cell, a building block inspired by biological cells. Each cell has a nucleus (traditional core logic handling roughly 95% of requests), a cytoplasm (semi-flexible processing for partially known cases, about 4%), and a membrane (fully agentic processing for novel situations, around 1%). A programmable router sits at the cell's entrance, deciding whether each incoming request should take the fast deterministic path or the slow cognitive path.

The crucial mechanism is what Vandeputte calls "thinking fast and slow" (borrowing from Kahneman). Routine requests hit traditional code with near-zero overhead. Unusual requests get routed to multi-agent processing. The system then monitors which cognitive solutions recur, and a cognitive workflow optimizer converts them into tested traditional code. Over time, each cell's core logic grows, and the proportion of requests requiring expensive AI processing shrinks. The agents, in effect, automate themselves out of their own jobs.

Multiple cells interconnect through an organic substrate, an evolution of traditional service meshes. Cells can dynamically discover new services, switch to alternatives when dependencies fail, and negotiate communication protocols. The substrate enforces governance through dedicated monitoring cells, cognitive firewalls (analogous to deep packet inspection but for semantic content), and agent sandboxes for untrusted computation. The paper proposes that immutable infrastructure should evolve into "reproducible organic infrastructure," where cells can mutate at runtime while retaining the ability to be cloned or rolled back.

Reliability · Behavioral

Reflective Processor

Assets self-assess outputs for quality and confidence before returning them, triggering additional verification or fallback strategies when needed.

Reliability · Behavioral

Resilience Fender

Absorbs cascading uncertainty from upstream assets. Can force rework, fall back to conservative methods, or act as a cognitive circuit breaker.

Excellence · Structural

Programmable Router

Routes each request to the optimal handler: direct API, lightweight LLM classifier, or full agentic reasoning. Continuously reprogrammed as core logic expands.

Evolvability · Behavioral

Unified Conversational Interface

Hybrid of rigid APIs and freeform conversation. Interactions start conversational and gradually crystallise into traditional endpoints as patterns emerge.

Evolvability · Creational

Cognitive Workflow Optimizer

Identifies recurring cognitive processing patterns and formalises them into traditional, tested workflows, reconfiguring the router accordingly.

Self-reliance · Behavioral

Infallible Fail-safe

Inescapable emergency shutdown for erratic autonomous behaviour, with automatic rollback to earlier versions or fail-safe minimal-functionality mode.

Assurance · Behavioral

Cognitive Firewall

Deep semantic inspection of inter-cell communications, layered on top of traditional service mesh firewalling. Adaptive scrutiny levels per asset trust.

Architectural · Structural

GenAI-native Cell

Self-contained unit with static core, dynamic cognitive extensions, adaptive router, DevOps agents, and management layer. The system's fundamental building block.

Vandeputte grounds the framework in two historical transitions. The shift from circuit-switched to IP networking introduced unreliable packet delivery, so higher layers developed error correction, reordering, and retransmission to compensate. The shift from monolithic to cloud-native microservices introduced latency and resource-sharing problems, so service meshes, CI/CD, and lifecycle management patterns emerged. In both cases, embracing the underlying technology's imperfections (rather than pretending they did not exist) and building systematic mitigation layers on top proved more successful than demanding perfection from the base layer.

The paper applies the same logic to GenAI. Stop trying to make LLMs perfectly reliable. Accept their probabilistic nature, build verification and fallback at every level, and systematically replace cognitive processing with traditional code wherever the problem becomes well-understood. The human organisation analogy runs throughout: enterprises tolerate imperfection within teams but impose quality gates at team boundaries, hold retrospectives, train employees on recurring problems, and fire people who repeatedly violate policy. GenAI-native systems, the paper argues, should work the same way.

This is a conceptual framework paper, not an empirical study. Vandeputte does not benchmark the 21 patterns against production systems, measure the overhead of cognitive firewalls, or demonstrate that the 95/4/1 processing split he proposes is achievable for specific workloads. The paper is explicit about this limitation, repeatedly noting that "most ideas require further validation through experimentation and real-life application."

The framework's value lies in its organising logic. It names problems that practitioners encounter daily (cascading unreliability in agent chains, runaway costs from unnecessary LLM calls, model swaps that break downstream behaviour) and proposes structured responses drawn from decades of software engineering discipline. The GenAI-native cell concept, where agents exist to improve and extend core logic rather than to be the core logic, is a genuinely useful mental model. It gives architects a vocabulary for the uncomfortable conversation about where AI should and should not sit in a production stack.

The sharpest insight may be the simplest one: the goal of a well-designed AI agent should be to make itself unnecessary for routine work. Systems that achieve this will be faster, cheaper, more reliable, and easier to debug than systems that route every request through a language model. The agents do not disappear. They remain as monitors, optimisers, and handlers of the genuinely novel. They just stop being the bottleneck.

Bottom Line

Vandeputte's framework offers a disciplined alternative to the "agents everywhere" trend. The core principle is counterintuitive but compelling: build systems where AI agents continuously work to reduce their own involvement in critical paths, converting bespoke cognitive solutions into tested traditional code. The agents remain omnipresent for monitoring, edge cases, and evolution, but the critical path belongs to fast, deterministic, proven logic. It is a blueprint, not a benchmark. The patterns need real-world validation. But the organising philosophy (embrace unpredictability, build mitigation layers, industrialise what works) is sound engineering applied to a technology that desperately needs more of it.

Reference

Vandeputte, F. (2025). Foundational design principles and patterns for building robust and adaptive GenAI-native systems. arXiv preprint arXiv:2508.15411v2. https://arxiv.org/abs/2508.15411