AI on Deterministic Rails

AI on deterministic rails: how AI and traditional deterministic software are forming a symbiotic stack from January 2025 through June 2026: the enterprise "PoC-opalypse" and the shift from token consumption to durable agentic adoption patterns, AI leveraging software-encoded workflows as guardrails (variance and error control) rather than replacing them, the frontier moving from raw model capability to model orchestration and harness design (Claude Code, OpenCode, Pi), right-sizing with smaller and open-weight models (Llama, Qwen, DeepSeek, Mistral) for cheap routine automation and private inference, and the token-pricing economics behind enterprise sticker-shock over agentic spend versus delivered value

Claude Opus 4.8
financial
frontier
academic
vc
blogs
tech

Synthesised 2026-06-07

Narrative

The PoC-opalypse is well-documented across independent blogs and Substacks. A May 2026 synthesis of HBR/Hyland survey data found that only 27 percent of enterprises have the connected data infrastructure needed for agentic AI, and only 17 percent had actually implemented it versus nearly half still in pilot. IBM data cited on Haverin's Substack puts only 25 percent of AI initiatives as delivering expected ROI and only 16 percent scaling enterprise-wide. The Uber case - 5,000 engineers consuming an entire annual AI budget in four months via Claude Code, with the COO admitting no clear line between token spend and delivered consumer features - became the canonical example of what several writers called token maximalism: usage divorced from outcome.

The harness-versus-model debate is the most active intellectual thread in independent technical writing between 2025 and mid-2026. Analysis of the leaked Claude Code codebase by Ken Huang and Patrick McGuinness on Substack demonstrated a three-tier permission system, a 46,000-line query engine, five-tier context compaction, and coordinator-worker multi-agent patterns whose sophistication surprised practitioners. The dominant conclusion across multiple independent writers, including Ben Dickson at TechTalks citing a UC Berkeley paper, is that system scaling has replaced model scaling as the binding constraint: once foundation models are embedded in tools and terminals, behaviour is determined by the system, not the model alone. Simon Willison, whose newsletter and blog are among the most-cited practitioner sources, identified November 2025 as the specific inflection point when Claude Opus 4.5 and GPT-5.2 made coding agents reliable enough for daily production use.

On open-weight models and cost routing, independent analysts reached a consistent conclusion by early 2026. Stanford HAI data shows the performance gap between open-weight and closed models narrowed from 8 percent to under 2 percent on Chatbot Arena by February 2025. Practitioner reporting from Particula Tech quantified a 60–70 percent cost reduction from routing 80 percent of enterprise requests to DeepSeek V4 or Qwen 3 variants, reserving frontier APIs for complex reasoning. The Digital Applied H1 2026 retrospective documented an order-of-magnitude drop in open-weight inference costs versus H2 2025, with DeepSeek V4 at approximately $0.14 per million input tokens against GPT-5 at $2.50.

Token pricing and enterprise cost shock dominate the most recent independent coverage. Goldman Sachs data, widely cited across blogs, projects a 24-fold increase in token consumption between 2026 and 2030, while Google disclosed processing 3.2 quadrillion tokens per month in May 2026, up sevenfold in one year. Simon Willison noted in an April 2026 podcast appearance that Anthropic's tokeniser changes for Opus 4.7 effectively constituted a 40 percent invisible price increase. The Veso AI and Praetorian blogs provide the most granular quantification of the corrective: deterministic scaffolding layers reduce per-query token use by 60–98 percent, depending on whether typed wrappers, schema validation, or output truncation are applied, making harness design not just an engineering choice but a direct financial control.

Sources

ID	Title	Outlet	Date	Significance
b1	The problem with agentic AI in 2025	Platforms, AI, and the Economics of BigTech (Substack)	2025-10	Advances the thesis that most enterprise agentic deployments are being governed by RPA-era thinking, producing only incremental efficiency rather than the coordination gains that agentic architecture makes possible.
b2	Microsoft Is The Canary In The AI-Adoption Coal Mine	ProductMind (Substack)	2026-06	Documents Microsoft's internal Claude Code rollout in December 2025 and licence cancellations by May 2026 due to budget overruns, synthesising METR productivity data and Microsoft Research findings on AI delegation quality degradation.
b3	Why Agentic AI Is Stalling Inside Most Enterprises	The AI Economy (Substack)	2026-05	Synthesises Harvard Business Review/Hyland survey data showing only 27 percent of enterprises have connected data needed for agentic AI, grounding the PoC-opalypse in data infrastructure failure rather than model capability.
b4	The Agentic Stack Wars: Part Three - EXTRACTION	Haverin (Substack)	2026-06	Provides detailed cost modelling of multi-model orchestration versus single-model RAG, citing McKinsey and IBM ROI data to argue value capture is structurally uneven and the pricing shift from flat to consumption billing is arriving before the ROI case is settled.
b5	The real, embarrassing state of enterprise AI adoption	Next Word (Substack)	2026-06	Names Uber's budget burnout directly and argues AI adoption outcomes follow a skill distribution, with talent density and incentive alignment separating genuine production use from expensive experimentation.
b6	75% of Enterprise AI Fails. The Fix Isn't a Better Model.	Product Impact Pod (Substack)	2026-03	Argues governance and structured knowledge (ontology, knowledge graphs) are the production differentiators, citing WEF data showing organisations with strong AI governance see 20 percentage points higher positive outcomes.
b7	The Claude Code Leak: 10 Agentic AI Harness Patterns That Change Everything	Agentic AI (Substack - Ken Huang)	2026-04	Analyses the leaked Claude Code codebase to extract ten harness engineering patterns, making the case that harness design, not model intelligence, is the production differentiator in agentic systems.
b8	CLAUDE CODE ORCHESTRATION	Agentic AI (Substack - Ken Huang)	2026-06	Documents Claude Code's May 2026 Dynamic Workflows feature - JavaScript-generated orchestration scripts fanning work across up to 1,000 parallel subagents - as evidence that deterministic control flow is now the primary performance lever, not model capability.
b9	Scaling the harness: The next major bottleneck in agentic AI	TechTalks (Substack - Ben Dickson)	2026-06	Reports a UC Berkeley paper arguing system scaling - harness design - has replaced model scaling as the binding constraint on agentic AI performance, and details Claude Code's five-tier context compaction system as a concrete implementation.
b10	Decode the Buzzword: Why Harness Engineering Matters Now	Next Signal Prediction (Substack)	2026-03	Maps a three-era evolution of AI engineering methodology - prompt engineering, context engineering, harness engineering - and identifies November 2025 as the inflection point where using models well overtook making models better.
b11	Claude Code's Secrets Revealed	AI Changes Everything (Substack - Patrick McGuinness)	2026-04	Documents the leaked Claude Code codebase architecture: 29,000-line tool system with three-tier permissions, 46,000-line query engine for LLM orchestration, and coordinator-worker multi-agent patterns for large-scale codebase operations.
b12	Agentic Harness - OpenClaw, Claude Code, and More	SolomonChrist AI (Substack)	2026-03	Provides comparative analysis of eight agentic harnesses - Claude Code, OpenCode, OpenClaw, Manus, Codex and others - identifying MCP standardisation and the local-versus-cloud deployment split as the two structural forces shaping the harness market.
b13	The Agentic Harness: Why the Orchestration Layer Is the Product	Veso AI Blog	2026-04	Argues that deterministic layers reduce per-query LLM token use by 60–80 percent versus naive full-model approaches, and that data-layer constraints - unlike prompt-based guardrails - cannot fail under adversarial or edge-case inputs.
b14	Agent & Harness & Micro-Orchestrator, Oh My!	Scaling DataOps (Substack)	2026-05	Practitioner account of outgrowing Claude Code and building custom orchestrators, documenting the shift from agentic harnesses for single-agent UX to orchestrators for repeated multi-step workflows at scale.
b15	How an Agent Harness Made My Claude Code Setup 10x More Reliable	AI Maker (Substack)	2026-05	Practitioner case study showing how memory, hooks, and review-agent loops around Claude Code produce repeatable quality that raw model access does not, illustrating the harness-as-governance pattern in a personal workflow context.
b16	Agentic Engineering Patterns - Simon Willison's Newsletter	Simon Willison's Newsletter (Substack)	2026-02	Willison, who coined 'prompt injection' and 'agentic engineering,' synthesises the November 2025 inflection point and documents practical patterns for deterministic control within agentic loops, including the layered orchestration architecture in OpenClaw.
b17	LLM predictions for 2026, shared with Oxide and Friends	Simon Willison's Newsletter (Substack)	2026-01	Willison argues November 2025 was the decisive inflection where coding agents became reliable daily drivers, and frames the Jevons paradox as the central unresolved question about whether lower code production costs expand or destroy engineering demand.
b18	I think 'agent' may finally have a widely enough agreed upon definition to be useful jargon now	Simon Willison's Newsletter (Substack)	2025-09	Willison formalises 'an LLM agent runs tools in a loop to achieve a goal' as a working definition, distinguishing deterministic from non-deterministic agents and framing LLMs as a non-deterministic layer added atop existing deterministic functions.
b19	Enterprise AI Inference: Open Models, Local AI, and the New Economics of Control	AI Realized Now (Substack)	2026-05	Applies Stanford HAI 2025 AI Index data - performance gap between open-weight and closed models narrowed from 8 percent to 1.7 percent by February 2025 - to argue enterprise teams should now treat open-weight routing as a first-class cost-control strategy.
b20	Open-Weight Models H1 2026: DeepSeek, Qwen, Llama Recap	Digital Applied	2026-05	Provides the most detailed public retrospective on H1 2026 open-weight model releases across DeepSeek V4, Qwen 3.x, and Llama 4 families, documenting an order-of-magnitude inference cost drop and the three distinct vendor release strategies that emerged.
b21	DeepSeek V4 and Qwen 3.5: Open-Source AI Is Rewriting the Rules in 2026	Particula Tech Blog	2026-03	Reports practitioner-measured 60–70 percent infrastructure cost reductions from routing 80 percent of requests to open-weight models, with DeepSeek V4 at roughly $0.14 per million input tokens versus GPT-5 at $2.50, making the unit economics concrete.
b22	The Big LLM Architecture Comparison	Ahead of AI (Substack - Sebastian Raschka)	2026-04	Raschka, a researcher and ML educator, provides the most comprehensive architectural comparison of 2025–2026 open-weight models, documenting the rise of MoE architectures and the efficiency gains that underpin open-weight cost advantages.
b23	Tech Philosophy and AI Opportunity	Stratechery (Ben Thompson)	2025-11	Thompson argues Anthropic's task is to build not just state-of-the-art models but all the deterministic computing scaffolding around them, explicitly naming the harness and orchestration layer as the enterprise product, not the model alone.
b24	Microsoft and Software Survival	Stratechery (Ben Thompson)	2026-02	Analyses how per-seat licensing - built around human identity via Active Directory - becomes structurally problematic as agent adoption shrinks human headcount, framing the token-pricing transition as an existential business model shift for enterprise software.
b25	Deterministic AI Orchestration: A Platform Architecture for Autonomous Development	Praetorian Blog	2026-02	Shows that replacing raw MCP connections with on-demand typed wrappers and Zod schema validation reduces token consumption by up to 98 percent per multi-tool operation, providing the most precise quantification of deterministic scaffolding's economic value.