Research · Blogs & Independent Thinkers
Back to sweepResearch sweep · deep · 2025 – 2026
AI on Deterministic Rails
- Claude Opus 4.8
- financial
- frontier
- academic
- vc
- blogs
- tech
Synthesised 2026-06-07
Narrative
The PoC-opalypse is well-documented across independent blogs and Substacks. A May 2026 synthesis of HBR/Hyland survey data found that only 27 percent of enterprises have the connected data infrastructure needed for agentic AI, and only 17 percent had actually implemented it versus nearly half still in pilot. IBM data cited on Haverin's Substack puts only 25 percent of AI initiatives as delivering expected ROI and only 16 percent scaling enterprise-wide. The Uber case — 5,000 engineers consuming an entire annual AI budget in four months via Claude Code, with the COO admitting no clear line between token spend and delivered consumer features — became the canonical example of what several writers called token maximalism: usage divorced from outcome.
The harness-versus-model debate is the most active intellectual thread in independent technical writing between 2025 and mid-2026. Analysis of the leaked Claude Code codebase by Ken Huang and Patrick McGuinness on Substack demonstrated a three-tier permission system, a 46,000-line query engine, five-tier context compaction, and coordinator-worker multi-agent patterns whose sophistication surprised practitioners. The dominant conclusion across multiple independent writers, including Ben Dickson at TechTalks citing a UC Berkeley paper, is that system scaling has replaced model scaling as the binding constraint: once foundation models are embedded in tools and terminals, behaviour is determined by the system, not the model alone. Simon Willison, whose newsletter and blog are among the most-cited practitioner sources, identified November 2025 as the specific inflection point when Claude Opus 4.5 and GPT-5.2 made coding agents reliable enough for daily production use.
On open-weight models and cost routing, independent analysts reached a consistent conclusion by early 2026. Stanford HAI data shows the performance gap between open-weight and closed models narrowed from 8 percent to under 2 percent on Chatbot Arena by February 2025. Practitioner reporting from Particula Tech quantified a 60–70 percent cost reduction from routing 80 percent of enterprise requests to DeepSeek V4 or Qwen 3 variants, reserving frontier APIs for complex reasoning. The Digital Applied H1 2026 retrospective documented an order-of-magnitude drop in open-weight inference costs versus H2 2025, with DeepSeek V4 at approximately $0.14 per million input tokens against GPT-5 at $2.50.
Token pricing and enterprise cost shock dominate the most recent independent coverage. Goldman Sachs data, widely cited across blogs, projects a 24-fold increase in token consumption between 2026 and 2030, while Google disclosed processing 3.2 quadrillion tokens per month in May 2026, up sevenfold in one year. Simon Willison noted in an April 2026 podcast appearance that Anthropic's tokeniser changes for Opus 4.7 effectively constituted a 40 percent invisible price increase. The Veso AI and Praetorian blogs provide the most granular quantification of the corrective: deterministic scaffolding layers reduce per-query token use by 60–98 percent, depending on whether typed wrappers, schema validation, or output truncation are applied, making harness design not just an engineering choice but a direct financial control.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | The problem with agentic AI in 2025 | Platforms, AI, and the Economics of BigTech (Substack) | 2025-10 | Advances the thesis that most enterprise agentic deployments are being governed by RPA-era thinking, producing only incremental efficiency rather than the coordination gains that agentic architecture makes possible. |
| b2 | Microsoft Is The Canary In The AI-Adoption Coal Mine | ProductMind (Substack) | 2026-06 | Documents Microsoft's internal Claude Code rollout in December 2025 and licence cancellations by May 2026 due to budget overruns, synthesising METR productivity data and Microsoft Research findings on AI delegation quality degradation. |
| b3 | Why Agentic AI Is Stalling Inside Most Enterprises | The AI Economy (Substack) | 2026-05 | Synthesises Harvard Business Review/Hyland survey data showing only 27 percent of enterprises have connected data needed for agentic AI, grounding the PoC-opalypse in data infrastructure failure rather than model capability. |
| b4 | The Agentic Stack Wars: Part Three — EXTRACTION | Haverin (Substack) | 2026-06 | Provides detailed cost modelling of multi-model orchestration versus single-model RAG, citing McKinsey and IBM ROI data to argue value capture is structurally uneven and the pricing shift from flat to consumption billing is arriving before the ROI case is settled. |
| b5 | The real, embarrassing state of enterprise AI adoption | Next Word (Substack) | 2026-06 | Names Uber's budget burnout directly and argues AI adoption outcomes follow a skill distribution, with talent density and incentive alignment separating genuine production use from expensive experimentation. |
| b6 | 75% of Enterprise AI Fails. The Fix Isn't a Better Model. | Product Impact Pod (Substack) | 2026-03 | Argues governance and structured knowledge (ontology, knowledge graphs) are the production differentiators, citing WEF data showing organisations with strong AI governance see 20 percentage points higher positive outcomes. |
| b7 | The Claude Code Leak: 10 Agentic AI Harness Patterns That Change Everything | Agentic AI (Substack — Ken Huang) | 2026-04 | Analyses the leaked Claude Code codebase to extract ten harness engineering patterns, making the case that harness design, not model intelligence, is the production differentiator in agentic systems. |
| b8 | CLAUDE CODE ORCHESTRATION | Agentic AI (Substack — Ken Huang) | 2026-06 | Documents Claude Code's May 2026 Dynamic Workflows feature — JavaScript-generated orchestration scripts fanning work across up to 1,000 parallel subagents — as evidence that deterministic control flow is now the primary performance lever, not model capability. |
| b9 | Scaling the harness: The next major bottleneck in agentic AI | TechTalks (Substack — Ben Dickson) | 2026-06 | Reports a UC Berkeley paper arguing system scaling — harness design — has replaced model scaling as the binding constraint on agentic AI performance, and details Claude Code's five-tier context compaction system as a concrete implementation. |
| b10 | Decode the Buzzword: Why Harness Engineering Matters Now | Next Signal Prediction (Substack) | 2026-03 | Maps a three-era evolution of AI engineering methodology — prompt engineering, context engineering, harness engineering — and identifies November 2025 as the inflection point where using models well overtook making models better. |
| b11 | Claude Code's Secrets Revealed | AI Changes Everything (Substack — Patrick McGuinness) | 2026-04 | Documents the leaked Claude Code codebase architecture: 29,000-line tool system with three-tier permissions, 46,000-line query engine for LLM orchestration, and coordinator-worker multi-agent patterns for large-scale codebase operations. |
| b12 | Agentic Harness — OpenClaw, Claude Code, and More | SolomonChrist AI (Substack) | 2026-03 | Provides comparative analysis of eight agentic harnesses — Claude Code, OpenCode, OpenClaw, Manus, Codex and others — identifying MCP standardisation and the local-versus-cloud deployment split as the two structural forces shaping the harness market. |
| b13 | The Agentic Harness: Why the Orchestration Layer Is the Product | Veso AI Blog | 2026-04 | Argues that deterministic layers reduce per-query LLM token use by 60–80 percent versus naive full-model approaches, and that data-layer constraints — unlike prompt-based guardrails — cannot fail under adversarial or edge-case inputs. |
| b14 | Agent & Harness & Micro-Orchestrator, Oh My! | Scaling DataOps (Substack) | 2026-05 | Practitioner account of outgrowing Claude Code and building custom orchestrators, documenting the shift from agentic harnesses for single-agent UX to orchestrators for repeated multi-step workflows at scale. |
| b15 | How an Agent Harness Made My Claude Code Setup 10x More Reliable | AI Maker (Substack) | 2026-05 | Practitioner case study showing how memory, hooks, and review-agent loops around Claude Code produce repeatable quality that raw model access does not, illustrating the harness-as-governance pattern in a personal workflow context. |
| b16 | Agentic Engineering Patterns — Simon Willison's Newsletter | Simon Willison's Newsletter (Substack) | 2026-02 | Willison, who coined 'prompt injection' and 'agentic engineering,' synthesises the November 2025 inflection point and documents practical patterns for deterministic control within agentic loops, including the layered orchestration architecture in OpenClaw. |
| b17 | LLM predictions for 2026, shared with Oxide and Friends | Simon Willison's Newsletter (Substack) | 2026-01 | Willison argues November 2025 was the decisive inflection where coding agents became reliable daily drivers, and frames the Jevons paradox as the central unresolved question about whether lower code production costs expand or destroy engineering demand. |
| b18 | I think 'agent' may finally have a widely enough agreed upon definition to be useful jargon now | Simon Willison's Newsletter (Substack) | 2025-09 | Willison formalises 'an LLM agent runs tools in a loop to achieve a goal' as a working definition, distinguishing deterministic from non-deterministic agents and framing LLMs as a non-deterministic layer added atop existing deterministic functions. |
| b19 | Enterprise AI Inference: Open Models, Local AI, and the New Economics of Control | AI Realized Now (Substack) | 2026-05 | Applies Stanford HAI 2025 AI Index data — performance gap between open-weight and closed models narrowed from 8 percent to 1.7 percent by February 2025 — to argue enterprise teams should now treat open-weight routing as a first-class cost-control strategy. |
| b20 | Open-Weight Models H1 2026: DeepSeek, Qwen, Llama Recap | Digital Applied | 2026-05 | Provides the most detailed public retrospective on H1 2026 open-weight model releases across DeepSeek V4, Qwen 3.x, and Llama 4 families, documenting an order-of-magnitude inference cost drop and the three distinct vendor release strategies that emerged. |
| b21 | DeepSeek V4 and Qwen 3.5: Open-Source AI Is Rewriting the Rules in 2026 | Particula Tech Blog | 2026-03 | Reports practitioner-measured 60–70 percent infrastructure cost reductions from routing 80 percent of requests to open-weight models, with DeepSeek V4 at roughly $0.14 per million input tokens versus GPT-5 at $2.50, making the unit economics concrete. |
| b22 | The Big LLM Architecture Comparison | Ahead of AI (Substack — Sebastian Raschka) | 2026-04 | Raschka, a researcher and ML educator, provides the most comprehensive architectural comparison of 2025–2026 open-weight models, documenting the rise of MoE architectures and the efficiency gains that underpin open-weight cost advantages. |
| b23 | Tech Philosophy and AI Opportunity | Stratechery (Ben Thompson) | 2025-11 | Thompson argues Anthropic's task is to build not just state-of-the-art models but all the deterministic computing scaffolding around them, explicitly naming the harness and orchestration layer as the enterprise product, not the model alone. |
| b24 | Microsoft and Software Survival | Stratechery (Ben Thompson) | 2026-02 | Analyses how per-seat licensing — built around human identity via Active Directory — becomes structurally problematic as agent adoption shrinks human headcount, framing the token-pricing transition as an existential business model shift for enterprise software. |
| b25 | Deterministic AI Orchestration: A Platform Architecture for Autonomous Development | Praetorian Blog | 2026-02 | Shows that replacing raw MCP connections with on-demand typed wrappers and Zod schema validation reduces token consumption by up to 98 percent per multi-tool operation, providing the most precise quantification of deterministic scaffolding's economic value. |