Research · Summary

Back to sweep

Research sweep · deep · 2025 – 2026

Agentic RAG — Evolution, Challenges, and Decision Criteria

Agentic RAG between November 2025 and May 2026: how retrieval-augmented generation is shifting toward agent-driven architectures, the operational problems (token burn, context management, latency, reliability), information-organisation patterns such as context catalogues and semantic categorisation, parallels with traditional data warehousing (dimensions, measures, star schemas), the evolving RAG tooling landscape, and decision criteria for switching to pure agentic workflows.

  • academic
  • frontier
  • tech
  • blogs
  • vc

Synthesised 2026-05-10

Agentic RAG, November 2025 to May 2026: The Pipeline Becomes a Planner

Overview

Retrieval-augmented generation has stopped being a pipeline and started being a control problem. Between November 2025 and May 2026, the dominant framing across academic, industry, and investor coverage shifted from "how do we chunk and embed better" to "how does an agent decide when, what, and how to retrieve". The Singh et al. survey (arXiv 2501.09136), revised to v4 in April 2026, established the taxonomy that most practitioners now use, classifying agentic RAG by agent cardinality, control structure, and autonomy level. Sources: arXiv (cs.AI) (2025) (); arXiv (Aditi Singh et al.) (2026) ()

The shift matters because static RAG is failing at scale. CSO Online reported that 72 to 80% of enterprise RAG implementations underperformed or failed within their first year, with 51% of all enterprise AI failures in 2025 traceable to RAG. InfoQ's April 2026 field report from three financial-services deployments documented a roughly 30% silent failure rate under static RAG across approximately 1,500 multi-hop queries. Sources: CSO Online (2026) (); InfoQ (2026) ()

The defining architectural event of the period is the Model Context Protocol's transfer from Anthropic to the Linux Foundation's Agentic AI Foundation in December 2025, alongside near-simultaneous adoption by OpenAI and Google. MCP reported 97 million monthly SDK downloads and 10,000 active servers by early 2026, making retrieval a tool category rather than a bespoke pipeline. Sources: Anthropic News (2025) (); InfoQ (2025) (); Model Context Protocol Blog (2025) ()

The investor and analyst class has now formalised the transition. Gartner's April 2026 Hype Cycle for Agentic AI introduced context graphs and FinOps for agentic AI as named profiles, while a16z's December 2025 Big Ideas report named data entropy, the decay of freshness and structure inside the unstructured 80% of corporate knowledge, as the binding constraint that breaks both RAG and agents. Sources: Gartner (2026) (); Andreessen Horowitz (a16z) (2025) ()

Key Findings

Agentic loops cost three to ten times more than static RAG, but not always. MarsDevs' production guide reports 8 to 12 second latencies for three to four iteration loops against the 1 to 2 second baseline for standard RAG, with a 3 to 10x token cost multiplier. The hidden assumption that this is unavoidable is wrong: Du et al.'s A-RAG (arXiv 2602.03442) showed that exposing hierarchical retrieval interfaces (keyword, semantic, chunk-read) lets agents match or undercut static baselines on token consumption while improving accuracy. Sources: MarsDevs (2026) (); arXiv (cs.CL) (2026) ()

Most hallucinations are engineering bugs, not model limitations. The InfoQ Q4 2025 financial-services study found that roughly 60% of hallucinations originated from unhandled execution errors rather than LLM reasoning, making error-recovery logic a more effective investment than model improvement. Google Research's ICLR 2025 sufficient-context paper supports this from a different angle: Gemma's error rate jumped from 10% to 66% when context was insufficient, and even Gemini and GPT failed to abstain appropriately when retrieval was poor. Sources: InfoQ (2026) (); Google Research Blog / ICLR 2025 (2025) ()

The compounding-step failure problem now drives architectural decisions. METR's analysis quantifies the maths: a 95%-reliable step chains to 36% end-to-end success across twenty sequential steps. METR's March 2025 time-horizon paper showed agentic task completion doubling every seven months, with the January 2026 update confirming the trend across GPT-5.1 Codex Max and Gemini 3 Pro. This is the quantitative basis for the design heuristic of keeping agentic loops short and instrumented. Sources: METR (2025) (); METR (2026) ()

GraphRAG works, but selectively. The ICLR 2026 GraphRAG-Bench paper showed GraphRAG achieving 13.4% lower accuracy than vanilla RAG on Natural Questions and introducing 2.3x higher latency on multi-hop tasks, yet improving multi-hop reasoning by 4.5% on HotpotQA. Min et al. documented a SAP legacy-code migration deployment where GraphRAG delivered 15% improvement over vector baselines, illustrating where graph construction costs are justified. Sources: arXiv / ICLR 2026 (2025) (); arXiv (cs.IR) (2025) ()

The accuracy delta on multi-hop questions is categorical, not marginal. A 2025 MDPI clinical study across 250 vignettes recorded 34% accuracy for static RAG versus 89% for agentic RAG, a 55-percentage-point gap. Wu et al.'s HiPRAG paper reduced over-search rates to 2.3% via hierarchical process rewards, suggesting the efficiency story is improving in parallel with the accuracy story. Sources: MarsDevs (2026) (); arXiv (cs.CL) (2025) ()

The framework field has consolidated to two layers. AIMultiple's January 2026 benchmark recorded a 53% token-count gap between Haystack (1.57k tokens) and LangChain (2.40k). LangGraph reached 1.0 in October 2025 and now functions as the stateful orchestration layer, while LlamaIndex handles retrieval depth. Microsoft merged AutoGen and Semantic Kernel into a unified framework, and LangChain pivoted away from general agent orchestration toward RAG tooling specifically. Sources: AIMultiple (2026) (); Medium (Trung Hiếu Trần) (2025) (); MindStudio (2026) ()

The data-warehouse analogy is now a working architecture, not a metaphor. Thoughtworks Volume 33 (November 2025) elevated the semantic layer for AI as a named technique, citing Snowflake Semantic Views, Databricks Metric Views, and dbt MetricFlow converging on Open Semantic Interchange v1.0. Towards Data Science documents ServiceNow's acquisition of data.world and Salesforce's $8 billion Informatica purchase as market confirmation that knowledge graphs and metadata management are becoming the semantic backbone for agentic AI. Sources: Thoughtworks Technology Radar (2025) (); Towards Data Science (2025) ()

Context engineering has displaced retrieval optimisation as the active research frontier. RAGFlow's December 2025 year-end review reframes RAG as a Context Engine, drawing the explicit ETL/ELT analogy: the ingestion pipeline (Parse-Transform-Index) is becoming the agentic equivalent of structured data warehousing's transformation tier. Thoughtworks Volume 33's central editorial theme was context engineering, marking the shift across the practitioner consensus. Sources: RAGFlow (2025) (); Thoughtworks Technology Radar (2025) ()

Pinecone's compilation-stage move signals a structural rethink. VentureBeat reported in May 2026 that Pinecone's internal benchmark showed a 98% token reduction by moving reasoning from inference time to a compilation stage, with Gartner's Arun Chandrasekaran framing this as embedding structural logic in the metadata layer rather than repeating interpretation work each session. This mirrors how data warehousing moved value from storage to transformation and semantics. Sources: VentureBeat (2026) ()

Frontier-lab productivity claims contradict practitioner field results. METR's July 2025 controlled study found developers using early-2025 AI tools took 19% longer than those without them, even as METR's time-horizon work showed capability genuinely doubling every seven months. The gap between benchmark capability and real-world reliability is the central calibration problem for agentic RAG decision-making. Sources: METR (2025) (); METR (2025) ()

Evidence & Data

The cost-and-latency numbers across sources cluster tightly. MarsDevs reports agentic loops at 8 to 12 seconds versus 1 to 2 seconds for static RAG, with token costs 3 to 10 times higher. Redis quantifies semantic caching as recovering up to 73% of cost in high-repetition workloads. SitePoint's February 2026 calculator gives concrete switching thresholds: below 200K tokens with fewer than 500 daily queries, long context with caching wins; above 500K tokens at 5,000-plus queries per day, RAG wins on cost. Sources: MarsDevs (2026) (); Redis (2026) (); SitePoint (2026) ()

Adoption figures are now concrete. McKinsey's November 2025 State of AI survey of 1,993 participants across 105 countries found 23% of enterprises scaling agentic AI in at least one function, with high performers three times more likely than peers to scale agents per function. Gartner's April 2026 Hype Cycle for Agentic AI reported only 17% of organisations had deployed agents while over 60% expected to within two years. CB Insights tracked private AI agent companies from roughly 300 in March 2025 to over 400 by November 2025, with one in five new unicorns building agents. Sources: McKinsey Global Institute (2025) (); Gartner (2026) (); CB Insights (2025) ()

Cost economics at the firm level are now visible. ICONIQ Capital's January 2026 survey of approximately 300 software executives found model inference rising from 20% to 23% of total product cost as companies scale, with gross margins projected at 52% for 2026. ICONIQ's June 2025 Builder's Playbook found RAG and fine-tuning each used by 66 to 69% of AI product builders. Sources: ICONIQ Capital (2026) (); ICONIQ Capital (2025) ()

Production-grade evaluation thresholds are converging. MarsDevs cites faithfulness ≥0.9, answer relevancy ≥0.85, and context precision ≥0.8 as 2026 production targets. Ragas reports 400k+ monthly downloads and forms one corner of the three-layer observability stack with LangSmith, Arize Phoenix, and Langfuse. Sources: MarsDevs (2026) (); arXiv / EACL 2024 (2023) ()

The capability trajectory is the spine of decision-making. METR's HCAST (189 tasks, 563 human attempts) and RE-Bench (7 ML research environments, 71 expert comparisons) establish the time-horizon methodology. The seven-month doubling time implies that systems struggling with multi-hour retrieval-planning today will handle day-length tasks within roughly a year, if the trend holds. Sources: METR (2025) (); arXiv / METR (2024) (); METR (2026) ()

Signals & Tensions

Token efficiency is not monotonic with agent complexity. The dominant practitioner narrative (MarsDevs, NStarX, CSO Online) treats agentic loops as inherently more expensive than static RAG. Du et al.'s A-RAG empirically falsifies this for hierarchical retrieval interfaces, and Pinecone's 98% token reduction via compilation-stage knowledge layers suggests the cost ceiling is architectural, not fundamental. The weak signal: a redesigned retrieval interface beats a redesigned agent loop. Sources: arXiv (cs.CL) (2026) (); VentureBeat (2026) ()

The framework debate is partly resolved, partly performative. MindStudio argues the traditional framework era is ending as MCP standardises tool integration and coding agents generate custom pipelines on demand. PremAI and Morph still treat the LangChain-versus-LlamaIndex question as live. The honest reading: heavyweight chain abstractions are eroding, but LangGraph as stateful executor and LlamaIndex as data layer have durable niches. Sources: MindStudio (2026) (); PremAI Blog (2026) (); Morph (2026) ()

GraphRAG is overhyped at the surface and underrated where it counts. The ICLR 2026 benchmark showed GraphRAG losing on Natural Questions by 13.4% and adding 2.3x latency, contradicting the marketing line that graph retrieval is universally superior. The real signal is enterprise-specific: 15% improvement on stable corpora with entity-relationship structure, where extraction cost amortises across queries. Sources: arXiv / ICLR 2026 (2025) (); arXiv (cs.IR) (2025) ()

Evaluation tooling is the most underreported gap. NStarX notes 70% of RAG systems still lack systematic evaluation frameworks, while METR's August 2025 update warned that algorithmic scoring on benchmarks like SWE-Bench Verified likely overstates real-world performance by a substantial margin. The disconnect between trajectory evaluation (per-node observability) and endpoint metrics is the practical reason agentic deployments fail silently in production. Sources: NStarX Inc. (2025) (); METR (2025) ()

The data-warehouse analogy is structurally genuine but breaks at the join layer. Context catalogues parallel dimensional modelling at the metadata-management layer (lineage, freshness, access control). They do not parallel SQL's deterministic joins, because embedding-based retrieval is probabilistic. Open Semantic Interchange v1.0 represents the first standardisation move that might genuinely close this gap, but it is too early to claim convergence. Sources: Thoughtworks Technology Radar (2025) (); Towards Data Science (2025) ()

MCP security is the next hardening cycle. Researchers documented prompt injection, tool-permission chaining, and lookalike tools in April 2025, structurally analogous to SQL injection and confused-deputy vulnerabilities. AI Realized Now's May 2026 piece on the eight agentic security vulnerabilities not getting attention suggests the multi-year hardening is starting late. Sources: PromptHub (2025) (); AI Realized Now (Substack) (2026) ()

Open Questions

How do switching costs from static to agentic RAG decompose? Practitioner discourse acknowledges the migration burden across application code, indexing scripts, evaluation pipelines, and SLA renegotiation, but no peer-reviewed paper quantifies it. The Mastercard fintech case (arXiv 2510.25518) offers qualitative evidence only. Sources: arXiv (cs.AI) (2025) ()

Where does long-context win permanently over RAG? SitePoint's thresholds (200K tokens, 500 daily queries) are heuristic. The deeper question of whether 1M-token windows with caching structurally displace retrieval for specific domains, particularly legal and medical with stable corpora, has no settled answer. Sources: SitePoint (2026) (); Agam Jain (personal blog) (2025) ()

Can compilation-stage knowledge layers replace inference-time retrieval at scale? Pinecone's 98% token reduction is a single-vendor benchmark. Whether the architectural pattern generalises beyond Pinecone's Nexus and Hindsight products, and at what corpus update frequency it breaks, is the most consequential infrastructure question of 2026. Sources: VentureBeat (2026) ()

What evaluation framework handles multi-agent trajectories? RAGalyst (arXiv 2025-11) and Arize Phoenix extend RAGAS-style scoring to multi-turn traces, but coverage for complex multi-agent choreographies is incomplete. The gap between output-only metrics and trajectory metrics is well-acknowledged but unresolved. Sources: arXiv (cs.CL) (2025) ()

Does the METR seven-month doubling time hold past 2026? Time Horizon 1.1 confirmed the trend across GPT-5.1 Codex Max and Gemini 3 Pro, but the holistic-versus-algorithmic gap means task-completion horizons may saturate earlier in real-world deployment than benchmark scaling suggests. Sources: METR (2026) (); METR (2025) ()

How will MCP security mature without breaking interoperability? The protocol's USB-C metaphor is structurally correct, but tool-permission chaining and lookalike-tool attacks have no standardised mitigation. Whether the Linux Foundation's Agentic AI Foundation produces a security profile fast enough to prevent a major incident is open. Sources: Model Context Protocol Blog (2025) (); AI Realized Now (Substack) (2026) ()

At what point does the context catalogue need a formal schema standard? Open Semantic Interchange v1.0 covers metric definitions but not retrievable-knowledge metadata more broadly. The dimensional-modelling analogue would be a Kimball-equivalent reference architecture for agentic retrieval, and none exists. Sources: Thoughtworks Technology Radar (2025) ()

The decision rule emerging from the sweep is narrower than the marketing suggests: keep static RAG for single-hop, latency-sensitive, predictable-corpus queries; move to agentic when query complexity demands multi-hop reasoning, tool diversity, or error recovery. The migration is not a paradigm shift but an architectural decomposition, and the firms doing it well are the ones who already know what their facts and dimensions are.


![[sources-agentic-rag-between-november-2025-and-may-2026-how]]


Sources

Summary: ↑ Back to summary


Academic & arXiv

ID Title Outlet Date Significance
a1 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG arXiv (cs.AI) 2025-01 Foundational survey introducing a principled taxonomy of agentic RAG architectures by agent cardinality, control structure, autonomy, and knowledge representation; revised through April 2026.
a2 SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions arXiv (cs.IR) 2026-03 First systematisation of knowledge paper to formalise agentic RAG as a finite-horizon partially observable Markov decision process, addressing fragmented architectures and inconsistent evaluation methodologies.
a3 A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces arXiv (cs.CL) 2026-02 Demonstrates that a truly agentic framework exposing hierarchical keyword, semantic, and chunk-read tools outperforms static RAG baselines with comparable or lower token consumption across open-domain QA benchmarks.
a4 HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation arXiv (cs.CL) 2025-10 Introduces a reinforcement-learning training method with fine-grained process rewards that reduces over-search to 2.3% on seven QA benchmarks, directly quantifying the token-burn problem in agentic retrieval loops.
a5 Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges arXiv (cs.AI) 2025-06 Maps agentic RAG reasoning paradigms onto dual-process cognitive theory and explicitly identifies the lost-in-the-middle problem and context management failures at scale as central industrial challenges.
a6 MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning arXiv (cs.CL) 2025-05 Training-free multi-agent framework with specialised Planner, Step Definer, Extractor, and QA agents; sets state-of-the-art on multi-hop QA and shows LLaMA3-8B with MA-RAG surpassing larger standalone models.
a7 RAG vs. GraphRAG: A Systematic Evaluation and Key Insights arXiv (cs.IR) 2025-02 Systematic empirical comparison establishing when graph-structured retrieval offers measurable gains versus vanilla RAG and characterising graph construction cost and latency trade-offs.
a8 When to Use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation arXiv / ICLR 2026 2025-06 Introduces GraphRAG-Bench and shows GraphRAG achieves 13.4% lower accuracy than vanilla RAG on factual queries but improves multi-hop reasoning by 4.5%, at 2.3x higher latency — a key decision-criteria paper.
a9 Towards Practical GraphRAG: Efficient Knowledge Graph Construction and Hybrid Retrieval at Scale arXiv (cs.IR) 2025-07 Proposes a cost-efficient enterprise GraphRAG pipeline fusing vector similarity with graph traversal via Reciprocal Rank Fusion, validating 15% improvement over vector baselines on legacy code migration datasets.
a10 StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering arXiv (cs.CL) 2025-10 Combines query decomposition with BFS-based knowledge graph traversal to assemble explicit evidence chains, achieving state-of-the-art on MuSiQue, 2WikiMultiHopQA, and HotpotQA.
a11 Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications arXiv (cs.AI) 2025-07 Real-world deployment case using a multi-tool LLM agent over a knowledge graph of INRAE publications, showing agentic architectures enable exhaustive dataset queries impossible with static RAG.
a12 Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems arXiv (cs.CL) 2025-10 Application-oriented survey covering retrieval granularity trade-offs, context contamination, and hallucination mitigation strategies across static and agentic RAG pipelines.
a13 FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG arXiv (cs.CL) 2026-01 Mechanistic analysis linking citation hallucination to internal transformer pathway dynamics, providing interpretable diagnostics for long-form agentic RAG outputs.
a14 TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG arXiv (cs.CL) 2025-12 Decomposes final token probability across transformer residual-stream components to detect hallucination in RAG, extending beyond the binary FFN-versus-context conflict model.
a15 M-RAG: Making RAG Faster, Stronger, and More Efficient arXiv (cs.IR) 2026-03 Proposes a chunk-free retrieval strategy addressing how fixed chunking disrupts contextual integrity and limits reasoning over causal and hierarchical document relationships.
a16 Ragas: Automated Evaluation of Retrieval Augmented Generation arXiv / EACL 2024 2023-09 Foundational evaluation framework providing reference-free metrics for faithfulness, answer relevance, and context relevance; remains the dominant RAG evaluation standard against which agentic pipeline tools are benchmarked.
a17 RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG arXiv (cs.CL) 2025-11 Introduces an agentic QA-dataset generation pipeline with filtering and optimised LLM-as-Judge metrics, demonstrating consistent outperformance of RAGAS across domain-specific evaluation tasks.
a18 RAG for Fintech: Agentic Design and Evaluation arXiv (cs.AI) 2025-10 Enterprise deployment study at Mastercard documenting agentic RAG pipeline design for fintech knowledge bases, including modular agents for query reformulation, acronym resolution, and iterative sub-query decomposition.
a19 A Survey of RAG-Reasoning Systems in LLMs arXiv (cs.CL) 2025-07 Taxonomises recent advances in retrieval-reasoning integration including in-context retrieval, chain-of-thought interleaving, and multi-agent orchestration patterns such as HM-RAG and Chain of Agents.
a20 Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding arXiv (cs.CV) 2025-10 Documents how current multimodal RAG benchmarks require 20–200 million visual tokens, far exceeding LLM context limits, motivating agent-driven iterative retrieval for long-document understanding.
a21 HCAST: Human-Calibrated Autonomy Software Tasks METR 2025 METR's 189-task benchmark across ML, cybersecurity, software engineering, and general reasoning with 563 human expert attempts, establishing calibrated time-horizon metrics for evaluating autonomous agents including agentic RAG systems.
a22 RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents Against Human Experts arXiv / METR 2024-11 METR benchmark comparing Claude 3.5 Sonnet and o1-preview against 71 human ML experts on research engineering tasks; finds agents achieve 4x human performance at 2-hour budget but humans outperform 2x at 32 hours.
a23 Measuring AI Ability to Complete Long Tasks (METR Time Horizons) METR 2025-03 METR's empirical analysis showing AI agent task-completion time horizons doubling every seven months, providing the scaling context within which agentic RAG capability growth should be interpreted.
a24 Research Update: Algorithmic vs. Holistic Evaluation METR 2025-08 Shows frontier model success rates on SWE-Bench Verified (~70–75%) likely overestimate real-world performance due to algorithmic scoring gaps, a methodological warning directly applicable to agentic RAG pipeline evaluation.
a25 The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation arXiv (cs.CV) 2026-05 Identifies 'recorruption' — where accurate external context causes a capable model to abandon a previously correct prediction — formalising a failure mode specific to RAG context injection.

Frontier Lab & Model News

ID Title Outlet Date Significance
t1 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG arXiv (v4 updated April 2026) 2025-01 The definitive survey paper on agentic RAG, introducing a principled taxonomy of architectures based on agent cardinality, control structure, autonomy, and knowledge representation, with an April 2026 update.
t2 How we built our multi-agent research system Anthropic Engineering Blog 2025 Anthropic's detailed engineering account of how their multi-agent research system replaced static RAG with multi-step, lead-agent plus subagent architecture, documenting context-overflow mitigations and task-description requirements.
t3 Building effective agents Anthropic Research 2024-12 Anthropic's foundational guidance on when to use agentic systems versus simpler retrieval-augmented LLM calls, with explicit caution about unnecessary complexity.
t4 Introducing the Model Context Protocol Anthropic News 2024-11 Announcement of MCP as the open standard for connecting AI agents to external data sources, directly enabling agent-driven retrieval across heterogeneous tool ecosystems.
t5 Donating the Model Context Protocol and establishing the Agentic AI Foundation Anthropic News 2025-12 Anthropic's transfer of MCP governance to the Linux Foundation's Agentic AI Foundation, cementing MCP as vendor-neutral infrastructure for agentic retrieval pipelines and reporting 97M+ monthly SDK downloads.
t6 MCP joins the Agentic AI Foundation Model Context Protocol Blog 2025-12 Official MCP blog post documenting the protocol's growth to 10,000 active servers and first-class support across ChatGPT, Claude, Gemini, and Microsoft Copilot.
t7 Measuring AI Ability to Complete Long Tasks METR 2025-03 METR's foundational paper establishing the time-horizon metric, showing frontier agentic task completion doubling every seven months — the key quantitative frame for measuring the long-horizon capability underpinning agentic RAG use cases.
t8 Time Horizon 1.1 METR 2026-01 METR's updated methodology with an expanded task suite, confirming the seven-month doubling time and adding evaluations for GPT-5.1 and Gemini 3 Pro relevant to agentic workload planning.
t9 Task-Completion Time Horizons of Frontier AI Models METR 2026-05 Living leaderboard tracking autonomous task horizons across all major frontier models, including Claude Opus 4.5, GPT-5.1, and Gemini 3 Pro, with the latest data point added May 2026 for Claude Mythos Preview.
t10 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity METR 2025-07 Controlled study finding developers using AI tools in early 2025 took 19% longer than without, providing a sceptical counterpoint to benchmark-driven optimism about agentic workflow productivity gains.
t11 Frontier AI Trends Report UK AI Security Institute (AISI) 2025 AISI's analysis of MCP server autonomy levels and growing cyber-task horizons, documenting that finance-focused MCP servers increasingly grant higher autonomy and that cyber task completion times doubled in roughly eight months.
t12 Deeper Insights into Retrieval Augmented Generation: The Role of Sufficient Context Google Research Blog / ICLR 2025 2025 Google Research paper showing that insufficient retrieval context paradoxically increases hallucination, with Gemma's error rate rising from 10% to 66% under poor retrieval — critical evidence for context-quality requirements in agentic RAG.
t13 Building with Gemini Embedding 2: Agentic Multimodal RAG and Beyond Google Developers Blog 2025-04 Google DeepMind's announcement of a unified multimodal embedding model spanning text, images, video, and audio in a single vector space, enabling agentic RAG pipelines that retrieve across modalities.
t14 RAG and Grounding on Vertex AI Google Cloud Blog 2024-06 Google's technical announcement of dynamic retrieval in Vertex AI Agent Builder, introducing cost-balancing logic to decide when to use Google Search versus parametric knowledge — a practical model for selective retrieval in agentic systems.
t15 Deep Research Max: a step change for autonomous research agents Google Blog 2025-04 Google's Deep Research Max, built on Gemini 3.1 Pro, demonstrates a production agentic RAG pattern: iterative search, MCP tool integration, and multimodal grounding across custom proprietary data and the open web.
t16 Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers arXiv 2025-05 Comprehensive academic survey covering agent-based universal RAG, corrective RAG, and graph-based retrieval, including quantitative findings such as Dual-Pathway KG-RAG reducing hallucinations by 18% in biomedical QA.
t17 Agentic Artificial Intelligence: Architectures, Taxonomies, and Evaluation of Large Language Model Agents arXiv 2026-01 Technical survey situating RAG as the persistent-memory layer within broader agent architectures, covering Anthropic computer-use tooling and OpenAI Operator, with references to OSWorld and SWE-bench evaluation infrastructure.
t18 Introducing GPT-5.5 OpenAI 2025-04 OpenAI's announcement claiming GPT-5.5 completes agentic Codex tasks with significantly fewer tokens than prior models, directly relevant to the token-efficiency dimension of agentic RAG cost modelling.
t19 OpenAI and Anthropic Donate AGENTS.md and Model Context Protocol to New Agentic AI Foundation InfoQ 2025-12 Authoritative industry coverage of the AAIF formation, documenting Google's parallel A2A protocol donation and the convergence of competing labs on open agent-interoperability standards.
t20 Anthropic launches enterprise 'Agent Skills' and opens the standard VentureBeat 2025-12 Reports Anthropic's Agent Skills open standard, demonstrating that OpenAI adopted structurally identical architecture in ChatGPT and Codex CLI, illustrating rapid cross-lab convergence on reusable workflow knowledge for agentic retrieval.
t21 OpenAI's Agents SDK and Anthropic's Model Context Protocol (MCP) PromptHub 2025-03 Technical comparison of OpenAI's Agents SDK (with built-in file search against vector stores) and Anthropic's MCP, covering the complementary roles of agentic orchestration and retrieval-connectivity layers.
t22 A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges ACL Anthology / IJCNLP 2025 Findings 2025 Peer-reviewed survey aligning RAG paradigms with System 1 / System 2 cognitive frameworks and cataloguing reasoning workflows including ReAct, SELF-RAG, and multi-hop decomposition in industry settings.
t23 Retrieval-Augmented Generation in Late 2025: a practical insight Medium 2025-10 Practitioner synthesis arguing that long-context models and search APIs can replace static RAG for many queries, framing the 'start with search, reach for RAG only when data volume demands it' contrarian decision criterion.
t24 AI Agent Landscape 2025–2026: A Technical Deep Dive Medium 2026-01 Technical overview documenting Anthropic's multi-agent researcher using a Memory tool to persist plans beyond the 200K token limit, and reporting that tool selection via semantic similarity improves accuracy 3x versus presenting all tools simultaneously.
t25 New AI Model Releases News (April 2026 Startup Edition) mean.ceo blog 2026-04 Documents the April 2026 model release wave, noting that every major 2026 release emphasises agentic capabilities and that MCP crossed 97 million installs in March 2026, marking its transition to foundational agentic infrastructure.

Tech Industry & Practitioner

ID Title Outlet Date Significance
p1 Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery InfoQ 2026-04 Field report from three financial-services deployments (Q4 2025, n=~1,500 multi-hop queries) showing 30% silent-failure rate and finding that ~60% of hallucinations originated from unhandled execution errors rather than model reasoning, directly grounding operational failure-mode claims.
p2 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (arXiv 2501.09136 v4) arXiv (Aditi Singh et al.) 2026-04 The most cited practitioner-adjacent survey of agentic RAG architectures; introduces a principled taxonomy by agent cardinality, control structure, and autonomy, with comparative trade-off analysis across healthcare, finance, and enterprise document processing use cases.
p3 Thoughtworks Technology Radar Volume 33 — Themes: Rise of Agents Elevated by MCP, Context Engineering, AI Antipatterns Thoughtworks Technology Radar 2025-11 Authoritative practitioner signal that RAG dominated Volume 32 conversation while Volume 33 shifted to agents and MCP, confirming the industry inflection from static retrieval to agentic workflows as observed by Thoughtworks CTO Rachel Laycock.
p4 Thoughtworks Technology Radar Highlights The Rapid Evolution of AI Assistance in 2025 (Vol. 33 press release) Thoughtworks / PR Newswire 2025-11 Official release statement confirming Volume 33's shift from RAG and prompt engineering (Vol. 32) to context engineering, MCP, and agentic systems, citing the growth of agentic workflows and enterprise AI antipatterns as the dominant themes.
p5 Thoughtworks Technology Radar — Techniques: Semantic Layer for AI (LLM text-to-SQL, dbt MetricFlow, Snowflake Semantic Views) Thoughtworks Technology Radar 2025-11 Practitioner evidence that semantic layers — the closest analogue to dimensional modelling in agentic systems — are now a first-class concern, with Thoughtworks warning that naive LLM text-to-SQL produces incorrect results when business rules live outside the schema.
p6 Thoughtworks Technology Radar Volume 32 — Supervised Agents, RAG Techniques, Data Product Thinking Thoughtworks 2025-04 Volume 32 spotlighted corrective RAG, Fusion-RAG, Self-RAG, and FastGraphRAG as Trial-level techniques, and introduced 'data product thinking' as the data management analogue of product management — directly relevant to context catalogue design.
p7 Thoughtworks Technology Radar — Platforms: Graphiti, Databricks Agent Bricks, Rhesis testing Thoughtworks Technology Radar 2026-04 Practitioner-assessed platform blips including Graphiti (temporal knowledge graph for LLM memory) and Databricks Agent Bricks; explicitly flags that flat vector stores in RAG pipelines fail to track how facts change over time.
p8 Themes from Technology Radar Vol. 33 — Podcast: Infrastructure Automation, Rise of Agents, MCP, AI Antipatterns Thoughtworks 2025-11 Neal Ford and Ken Mugrage explain the editorial reasoning behind Volume 33's shift from RAG to agents and MCP, including the concept of context engineering — 'how do you tell the agents what they're supposed to do and give them roles.'
p9 RAG in 2026: The UK/EU Enterprise Guide to Grounded GenAI Data Nucleus 2026-01 Practitioner guide situating EU AI Act and GDPR obligations alongside agentic RAG architecture guidance, covering framework selection (LangGraph, LlamaIndex, AutoGen, CrewAI), access control patterns, and ReAct/Tree-of-Thoughts retrieval reasoning.
p10 Agentic RAG: The 2026 Production Guide MarsDevs 2026-05 Production-focused guide with quantified latency benchmarks (standard RAG 1–2 s; agentic loop 8–12 s; 3–10x token cost multiplier) and three-layer evaluation architecture using Ragas, Arize Phoenix, and Langfuse — the most numerically grounded cost/latency source in the sweep.
p11 Next-Generation Agentic RAG with LangGraph (2026 Edition) Medium (Vinod Rane) 2026-03 Detailed implementation guide for stateful agentic RAG using LangGraph directed cyclic graphs, with per-node RAGAS observability instrumentation (critic_score, retrieval_round, iteration_count, token_budget_used) and production metric targets.
p12 RAG Framework Benchmark: LangChain vs LangGraph vs LlamaIndex vs Haystack vs DSPy AIMultiple 2026-01 Standardised 100-query benchmark across five frameworks with identical models (GPT-4.1-mini) and retriever (Qdrant), isolating framework overhead and token efficiency: DSPy 3.53 ms overhead; Haystack 1.57k tokens vs LangChain 2.40k tokens — a 53% token difference that compounds at scale.
p13 LangChain vs LlamaIndex (2026): Complete Production RAG Comparison PremAI Blog 2026-03 Documents the architectural split between LangChain/LangGraph (workflow-first, stateful graphs) and LlamaIndex (retrieval-first, data-centric agents), noting LangGraph reached 1.0 stability in October 2025 and effectively superseded original chain-based LangChain for production agentic work.
p14 Why LLM Frameworks Like LangChain and LlamaIndex Are Being Replaced by Agent SDKs MindStudio 2026-03 Analyses the structural disruption of heavyweight RAG frameworks by native tool-calling, expanded context windows, MCP standardisation, and agent SDKs; includes LlamaIndex co-founder Jerry Liu's public acknowledgement that the framework era is ending.
p15 LLM Frameworks Compared (2026): LangChain, LlamaIndex, DSPy and More Morph 2026-03 Documents framework consolidation into four categories (orchestration, agents, optimisation, code-specific), reports LangChain at 100K+ GitHub stars and 34.5 million monthly LangGraph downloads, and warns that stacking three or more frameworks signals overengineering.
p16 The 5 Best RAG Evaluation Tools You Should Know in 2026 Maxim AI 2026-02 Comparative review of the five dominant evaluation platforms (Maxim AI, LangSmith, Arize Phoenix, Ragas, DeepEval), noting RAGAS exceeds 400,000 monthly downloads and 20 million evaluations, and that LangSmith's tight LangChain coupling creates friction in mixed-framework environments.
p17 Top RAG Evaluation Tools in 2026 Goodeye Labs 2026-03 Independent ranking of seven evaluation platforms including Weights & Biases Weave and Braintrust, with evidence that a 2025 study found Microsoft Copilot gave medically incorrect advice 26% of the time — illustrating the real-world stakes of inadequate RAG evaluation.
p18 7 RAG Evaluation Tools You Must Know Iguazio 2025-12 Practitioner-oriented tool guide covering Ragas, LangSmith, Arize Phoenix, TruLens, and Promptfoo for continuous RAG evaluation in CI/CD pipelines, relevant to the maturing DevOps practices around agentic AI quality assurance.
p19 Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI Towards Data Science 2025-10 Documents the M&A consolidation around knowledge graph and semantic layer infrastructure (ServiceNow's acquisition of data.world, Salesforce's $8bn Informatica purchase), with Gartner's May 2025 recommendation that data engineering teams adopt ontologies and knowledge graphs to support AI.
p20 From RAG to Context: A 2025 Year-End Review of RAG RAGFlow 2025-12 Engineering team's year-end synthesis introducing 'Context Engineering' as the successor discipline to RAG optimisation, describing the shift from tuning single retrieval algorithms to systematic design of the end-to-end retrieval–context assembly–model reasoning pipeline.
p21 Why 2025's Agentic AI Boom Is a CISO's Worst Nightmare CSO Online 2026-02 Reports that 72–80% of enterprise RAG implementations significantly underperform or fail within their first year, and that 51% of all enterprise AI failures in 2025 were RAG-related; also identifies the '20,000-document cliff' latency and accuracy degradation pattern.
p22 LLM Token Optimization: Cut Costs and Latency in 2026 Redis 2026-02 Vendor-authored technical guide quantifying that semantic caching achieves up to 73% cost reduction in high-repetition agentic workloads, with benchmarks contrasting cache-hit millisecond response against seconds-scale fresh LLM inference.
p23 Agentic RAG: When Static Retrieval Is No Longer Enough Medium 2026-03 Cites MDPI Electronics 2025 study across 12 RAG variants and 250 clinical vignettes showing Self-RAG at 5.8% hallucination rate and a 55-percentage-point multi-hop accuracy gap between static RAG (34%) and agentic RAG (89%), providing the strongest quantitative case for the capability differential.
p24 The Next Frontier of RAG: How Enterprise Knowledge Systems Will Evolve (2026–2030) NStarX Inc. 2025-12 Engineering team's forward-looking analysis reframing RAG as a 'knowledge runtime' analogous to Kubernetes for application workloads, with governance, retrieval quality gates, and audit trails as mandatory infrastructure — the most explicit articulation of RAG-as-operational-infrastructure.
p25 10 RAG Architectures in 2026: Enterprise Use Cases and Strategy Techment 2026-03 Practitioner decision framework for CTO/CDO selection across ten RAG architectures, explicitly stating that Agentic RAG is 'only necessary for complex, multi-step workflows' and that most enterprise search performs well with Hybrid RAG — the clearest published decision threshold for switching.

Blogs & Independent Thinkers

ID Title Outlet Date Significance
b1 Agentic RAG with LangGraph & Telegram (with Video explanation) Jam With AI (Substack) 2025-11 Practitioner walkthrough contrasting static RAG pipelines against LangGraph-driven agentic RAG with query validation, document grading, and iterative query rewriting in production.
b2 All you need to know about RAG (in 2026) AI with Aish (Substack) 2026-03 Comprehensive 2026 state-of-the-art survey covering semantic chunking, hybrid search, cross-encoder reranking, and the operational economics of RAG versus long-context windows.
b3 Comparative Analysis of RAG Architectures: Pipeline, Agentic, and Knowledge Graph (2026 Landscape) Micheal Lanham (Substack) 2026-02 Cites 2026 State of AI Agents data showing 57% of organisations have deployed multi-stage agents, and frames quality as the primary production blocker with observability as table stakes.
b4 How to Build an AI Agent Company in 2026: Lessons from Glean's $7.2B Playbook Market Curve (Substack) 2026-01 Analyses Glean's December 2025 'Enterprise Context' platform combining memory, connectors, indexes, and personal/enterprise graphs as a case study in productising agentic RAG at scale.
b5 The Eight Agentic AI Security Vulnerabilities Nobody's Talking About AI Realized Now (Substack) 2026-05 Documents the CVE-2025-32711 EchoLeak exploit in Microsoft 365 Copilot's RAG pipeline and notes that prompt injection appeared in 73% of assessed production AI deployments in 2025.
b6 From RAG to Context — A 2025 Year-End Review of RAG RAGFlow Blog 2025-12 Authoritative year-end review arguing RAG is evolving from a retrieval pattern into a 'Context Engine', draws an explicit analogy between RAG ingestion pipelines (PTI) and ETL/ELT tooling in the structured data warehouse world.
b7 Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI Towards Data Science (Medium) 2025-10 Argues that knowledge graphs and metadata management are becoming the semantic backbone for agentic AI, cites Gartner's May 2025 recommendation to adopt ontologies, and traces market consolidation including Salesforce's $8 billion Informatica acquisition.
b8 RAG vs Memory: Addressing Token Crisis in Agentic Tasks Agam Jain (personal blog) 2025-10 First-principles analysis of context-window economics showing agentic coding sessions routinely hit 50–200K tokens and research queries can exceed 2M tokens, with a decision framework for RAG versus hierarchical memory.
b9 Lessons from Implementing RAG in 2025 TrueState Blog 2025-11 Practitioner post-mortem from November 2025 documenting that direct context injection fails on three axes (performance, context-window limits, and per-message cost of $1.50/million tokens) and advising RAG strategy selection based on document count and latency constraints.
b10 Long Context vs RAG: When 1M Token Windows Replace RAG SitePoint 2026-02 Quantitative decision framework with cost calculators showing long context wins below 200K tokens with fewer than 500 daily queries, while RAG wins above 500K tokens at 5,000-plus queries per day.
b11 Agentic RAG in 2026: The UK/EU Enterprise Guide to Grounded GenAI Data Nucleus 2026-01 Practitioner guide contextualising agentic RAG within EU AI Act obligations and GPAI rules, covering ReAct and Tree-of-Thoughts patterns with compliance-oriented deployment guidance.
b12 Agentic RAG: The 2026 Production Guide MarsDevs 2026-05 Sets concrete production targets (faithfulness ≥0.9, answer relevancy ≥0.85, context precision ≥0.8), identifies the dominant 2026 stack as LangGraph for orchestration plus LlamaIndex for retrieval, and notes agentic RAG 3–4 iteration loops take 8–12 seconds.
b13 The Next Frontier of RAG: How Enterprise Knowledge Systems Will Evolve (2026–2030) NStarX Inc. 2025-12 Reports that production agentic RAG deployments show a 25–40% reduction in irrelevant retrievals but introduce new failure modes including retrieval loops and over-retrieval, and notes 70% of RAG systems still lack systematic evaluation frameworks.
b14 Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery InfoQ 2026-04 Reports that in Q4 2025 financial services testing (n=1,500 multi-hop queries), approximately 60% of hallucinations originated from unhandled execution errors rather than LLM reasoning flaws, making error-recovery mechanisms the highest-ROI reliability investment.
b15 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (arXiv 2501.09136) arXiv (Singh, Ehtesham et al.) 2025-01 The field's primary taxonomy paper, classifying agentic RAG architectures by agent cardinality, control structure, autonomy, and knowledge representation, and identifying memory management, evaluation, and governance as open research gaps.
b16 RAG in 2026: Bridging Knowledge and Generative AI Squirro Blog 2026-04 Enterprise practitioner view arguing that agentic workflows require a context graph of past decisions to avoid reliability degradation, with case studies including a European bank saving EUR 20 million over three years.
b17 The AI Agent Framework Landscape in 2025: What Changed and What Matters Medium (Trung Hiếu Trần) 2025-11 Documents Microsoft's October 2025 merger of AutoGen and Semantic Kernel into a unified agent framework, LangChain's pivot away from agent orchestration toward RAG tooling, and the consolidation of the broader framework landscape.
b18 LangGraph vs LlamaIndex Workflows for Building Agents — The Final No-BS Guide (2025) Medium (Pedro Azevedo) 2026-01 Engineering-oriented teardown concluding that LlamaIndex leads on mature RAG modules while LangGraph leads on stateful multi-agent orchestration, with LangGraph noted for frequent breaking changes across versions.
b19 Why LLM Frameworks Like LangChain and LlamaIndex Are Being Replaced by Agent SDKs MindStudio Blog 2026-03 Argues the traditional framework era is ending as native tool calling, expanded context windows, and MCP standardisation erode the case for heavyweight RAG abstractions, citing LlamaIndex co-founder Jerry Liu's public acknowledgement of this disruption.
b20 RAG Frameworks: LangChain vs LangGraph vs LlamaIndex vs Haystack vs DSPy AIMultiple 2026-01 Benchmarks five frameworks across 100 queries with standardised components (GPT-4.1-mini, BGE-small, Qdrant, Tavily), isolating orchestration overhead and token efficiency to provide the most rigorous public framework comparison to date.
b21 Agentic AI Frameworks 2026 Flobotics 2025-12 Detailed framework landscape survey positioning DSPy for LLM output optimisation in complex reasoning pipelines, PydanticAI for type-safe production systems, and LlamaIndex as the default for data-centric agentic applications.
b22 Production RAG in 2026: LangChain vs LlamaIndex Rahul Kolekar (personal blog) 2026-01 Practitioner post documenting the prevalent production pattern of LlamaIndex handling ingestion and retrieval while LangChain/LangGraph handles orchestration, with advice to prefer 2-step RAG before adding agentic complexity.
b23 Designing Agentic Loops Simon Willison's Weblog 2025-09 Simon Willison's foundational post on agentic loop design, warning that agents are inherently dangerous due to prompt injection risks and that YOLO-mode tool execution maximises productivity at the cost of safety controls.
b24 Writing about Agentic Engineering Patterns Simon Willison's Weblog 2026-02 Announces Willison's ongoing Agentic Engineering Patterns guide, distinguishing professional agent-assisted development from vibe coding and framing the central challenge as the near-zero cost of generating initial code versus the unchanged cost of knowing what to build.
b25 RAG in 2025: The Enterprise Guide to Retrieval Augmented Generation, Graph RAG and Agentic AI Data Nucleus 2026-01 Refutes the 'RAG is dead' narrative by demonstrating how enterprises blend agents for orchestration with RAG for grounding, and catalogues Graph RAG, HyDE, ColBERT reranking, and agentic patterns as complementary rather than competing approaches.

VC & Analyst Reports

ID Title Outlet Date Significance
v1 Leaders, Gainers and Unexpected Winners in the Enterprise AI Arms Race Andreessen Horowitz (a16z) 2026-02 Quantitative enterprise survey showing reasoning models are enabling new agentic workflows, with 54% of respondents citing accelerated LLM adoption and real production data on model provider share shifts.
v2 Big Ideas 2026: Part 1 — Unstructured Data, Agent-Speed Infrastructure, and Data Entropy Andreessen Horowitz (a16z) 2025-12 Frames 'data entropy' as the binding constraint on RAG and agentic AI, arguing that 80% of corporate knowledge lives in unstructured formats and that downstream AI workloads break without continuous data governance.
v3 State of AI: An Empirical 100 Trillion Token Study with OpenRouter Andreessen Horowitz (a16z) 2025-12 Provides production-traffic evidence that agentic inference is the fastest-growing behaviour on OpenRouter, with models planning, retrieving from tools and iterating rather than responding in single prompts.
v4 Big Ideas 2026: The Agentic Interface Andreessen Horowitz (a16z) 2025-12 Defines the strategic thesis that interfaces are shifting from chat to action and design must become agent-readable, directly framing how retrieval and knowledge must be reorganised for agent consumption.
v5 Getting Retrieval-Augmented Generation Right: Part One Forrester Research 2025 Analyst report analysing key RAG challenges and introducing shared terminology to enable cross-team collaboration, forming the baseline vocabulary for enterprise RAG deployment decisions.
v6 Getting Retrieval-Augmented Generation Right: Part Two Forrester Research 2025 Covers best practices across RAG indexing, retrieval, generation, and agentic support, with pioneer case studies resolving real engineering challenges in production.
v7 How To Get Retrieval-Augmented Generation Right (Blog) Forrester Research 2025-06 Public-facing Forrester summary noting that 50% of organisations are piloting agentic AI and 24% have it in production, with RAG identified as critical infrastructure for the transition.
v8 Reference Architecture Brief: Retrieval-Augmented Generation Gartner 2025 Provides a blueprint for scalable generative AI development integrating LLMs with enterprise data, serving as the authoritative reference architecture for enterprise RAG implementations.
v9 Gartner Hype Cycle for Artificial Intelligence 2025 Gartner 2025-07 Places AI agents and AI-ready data as the two fastest-advancing technologies at the Peak of Inflated Expectations, while GenAI slides into the Trough of Disillusionment, signalling the structural pivot to agentic workflows.
v10 2026 Hype Cycle for Agentic AI Gartner 2026-04 Dedicated agentic AI Hype Cycle showing that only 17% of organisations have deployed AI agents despite 60%+ intending to within two years, with governance, FinOps for agentic AI, and context graphs as emerging profiles on the curve.
v11 The State of AI in 2025: Agents, Innovation, and Transformation McKinsey Global Institute 2025-11 Survey of 1,993 participants showing 23% of enterprises are scaling agentic AI in at least one function, with knowledge management identified as a leading adoption domain and high performers 3x more likely to have scaled agents.
v12 AI in the Workplace: A Report for 2025 (Superagency) McKinsey Global Institute 2025-01 Frames the five big AI innovations for business including agentic AI, and illustrates the shift from copilot-style RAG to agents that plan, retrieve, and execute multi-step workflows in live enterprise systems.
v13 The AI Agent Market Map — November 2025 Edition CB Insights 2025-11 Maps 400+ private AI agent companies across 26 categories, noting that 1 in 5 new unicorns are building agents and agentic solutions have become leading acquisition targets for enterprise software incumbents.
v14 The AI Agent Tech Stack CB Insights 2025-10 Maps 135+ startups across 17 infrastructure markets including retrieval, memory, orchestration, and observability, identifying reliability as the central challenge driving investment in evaluation and governance tooling.
v15 The State of AI 2025 — Bessemer Venture Partners Bessemer Venture Partners 2025-08 Predicts that 2025 to 2026 marks the turning point for private, grounded evaluation frameworks, and that enterprise deployment will scale tenfold once trust in AI outputs is established through reproducible, use-case-specific evals.
v16 AI Infrastructure Roadmap: Five Frontiers for 2026 Bessemer Venture Partners 2026-03 Sets out five infrastructure frontiers for 2026 including agentic AI, providing a VC roadmap framing the stack layers that must mature for RAG-to-agent migration to complete at enterprise scale.
v17 State of AI: Bi-Annual Snapshot — The Execution Era of AI ICONIQ Capital 2026-01 Survey of ~300 software executives showing model inference rises from 20% to 23% of total cost as products scale, gross margins projected at 52% in 2026, and 40% of $500M+ revenue companies actively deploying agents.
v18 State of AI 2025: The Builder's Playbook — ICONIQ Capital ICONIQ Capital 2025-06 Shows RAG and fine-tuning as the dominant model training techniques at 66-69% usage each, with nearly 80% of AI-native builders investing in agentic workflows as their primary product type.
v19 Agentic Enablers: Treating AI's Amnesia and Other Disorders MMC Ventures 2025-11 VC research report framing context and memory management as the core technical barrier to reliable agentic AI, distinguishing between context (working memory) and persistent memory, and analysing knowledge graphs as a complement to RAG for multi-hop reasoning.
v20 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (arXiv 2501.09136) arXiv (preprint) 2025-01 The most comprehensive academic survey of agentic RAG architectures, introducing a principled taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation, updated through April 2026.
v21 The RAG Era Is Ending for Agentic AI — A New Compilation-Stage Knowledge Layer Is What Comes Next VentureBeat 2026-05 Reports Pinecone's internal benchmark showing a 98% token reduction for a financial analysis task using compilation-stage knowledge versus runtime RAG, with Gartner analyst Arun Chandrasekaran commenting that architectural compilation embeds structural logic into the metadata layer.
v22 Agentic AI Applications in Vector Database Market — Size, Share and Forecast to 2030 Mordor Intelligence 2025-11 Sizes the agentic AI vector database market at $0.46 billion in 2025 growing to $1.45 billion by 2030 at 25.97% CAGR, with autonomous agents projected to grow at 61.5% CAGR, outpacing conversational AI and RAG.
v23 Agentic RAG: The 2026 Production Guide MarsDevs 2026-05 Practitioner production guide documenting that agentic RAG costs 3-10x more in tokens at runtime versus static RAG, and that MCP became the de facto standard retrieval-tool surface after Anthropic donated it to the Linux Foundation in December 2025.
v24 Gartner Hype Cycle Identifies Top AI Innovations in 2025 (Press Release) Gartner 2025-08 Official press release confirming AI agents and AI-ready data as the two fastest-advancing technologies at the Peak of Inflated Expectations, with Gartner analyst Haritha Khandabattu warning that no AI agent can be used in every case.
v25 Why RAG Is Failing Agentic AI Development Corporate 2026-05 Synthesises Pinecone, Qdrant, and LlamaIndex CEO statements to document that 85% of agent compute goes to re-discovery rather than task completion, and that enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.