Research · Summary
Back to sweepResearch sweep · deep · 2025 – 2026
Agentic RAG - Evolution, Challenges, and Decision Criteria
Agentic RAG between November 2025 and May 2026: how retrieval-augmented generation is shifting toward agent-driven architectures, the operational problems (token burn, context management, latency, reliability), information-organisation patterns such as context catalogues and semantic categorisation, parallels with traditional data warehousing (dimensions, measures, star schemas), the evolving RAG tooling landscape, and decision criteria for switching to pure agentic workflows.
- Claude Opus 4.8
- academic
- frontier
- tech
- blogs
- vc
Synthesised 2026-05-10
Agentic RAG, November 2025 to May 2026: When Retrieval Learned to Plan
Overview
Retrieval-augmented generation stopped being a pipeline and became a control problem. The single-pass pattern formalised by Lewis et al. in 2020 (embed query, retrieve top-k, stuff context, generate) handled FAQ deflection and single-hop document Q&A competently, but it carried an assumption that one retrieval step would surface sufficient evidence. That assumption fails for multi-hop reasoning, heterogeneous corpora, and queries whose required evidence only becomes identifiable after a partial answer exists. The defining shift of the past 18 months is that the field now treats retrieval as a sequential decision-making process: an agent decides when to retrieve, how to refine, and when to stop. Sources: arXiv (cs.AI) (2025) (↗); arXiv (cs.IR) (2026) (↗)
The framing change is not cosmetic. Mishra et al. formalised agentic RAG as a finite-horizon partially observable Markov decision process, arguing the field's fragmentation stems from the absence of that mathematical grounding. The dominant taxonomy from Singh et al., revised to v4 in April 2026, classifies architectures by agent cardinality, control structure, and autonomy level, and the same survey flags the open problems that recur across every lane: cost-aware planning, long-term memory drift, and the inadequacy of output-only evaluation. Sources: arXiv (cs.IR) (2026) (↗); arXiv (Aditi Singh et al.) (2026) (↗)
Why now is partly an infrastructure story. Anthropic's Model Context Protocol, released November 2024 and donated to the Linux Foundation's Agentic AI Foundation in December 2025, standardised the tool-calling surface and decoupled retrieval infrastructure from agent orchestration. OpenAI and Google both adopted it within months. That removed the last major justification for bespoke connectors and made retrieval one tool category among many. Sources: Anthropic News (2024) (↗); Anthropic News (2025) (↗); InfoQ (2025) (↗)
The tension that runs through everything below is economic. Agentic loops buy accuracy on hard queries and pay for it in tokens, latency, and new failure modes. The interesting work in this window is not proving agentic RAG can be more accurate; it is establishing precisely when the premium is worth paying.
Timeline
- Model Context Protocol released
- Lost-in-the-middle confirmed as long-context limit
- METR time-horizon metric published (7-month doubling)
- Agentic RAG taxonomy established
- Volume 32 Radar centres RAG techniques
- Sufficient-context shown as primary hallucination driver
- Early-2025 dev productivity study (19% slower)
- Over-search/under-search quantified
- Radar Vol. 33 pivots to agents and MCP
- MCP donated to Linux Foundation
- Context Engineering named as successor to RAG tuning
- Gartner Agentic AI Hype Cycle, context graphs named
- LangGraph + LlamaIndex becomes default production stack
- Compilation-stage knowledge layer claims 98% token reduction
- Semantic-layer standardisation around OSI v1.0
Key Findings
Agentic does not automatically mean more expensive, but in practice it usually is. The headline practitioner number, repeated across MarsDevs and independent guides, is a 3-10x token cost multiplier for agentic loops versus static pipelines, with three-to-four iteration loops taking 8-12 seconds against a 1-2 second baseline. The academic counterpoint matters: Du et al.'s A-RAG showed that exposing hierarchical retrieval interfaces (keyword, semantic, chunk-read) lets agents beat both static and workflow-RAG baselines at equal or lower token consumption. The cost premium is a property of naive loop design, not of agency itself. Sources: MarsDevs (2026) (↗); arXiv (cs.CL) (2026) (↗)
Most enterprise RAG failures are engineering failures, not model failures. InfoQ's April 2026 field report from three financial-services deployments (roughly 1,500 multi-hop queries in Q4 2025) found a 30% silent failure rate under static RAG, with around 60% of hallucinations tracing to unhandled execution errors rather than model reasoning. The implication is direct: error-recovery logic is a better investment than a bigger model. CSO Online's reporting that 72-80% of enterprise RAG implementations underperformed or failed within their first year, and that 51% of all enterprise AI failures in 2025 were RAG-related, frames this as a systemic operational problem rather than isolated bad luck. Sources: InfoQ (2026) (↗); CSO Online (2026) (↗)
Retrieval quality, not model size, drives hallucination. Google Research's ICLR 2025 sufficient-context paper showed Gemma's error rate jumping from 10% to 66% when context was insufficient, and found even Gemini and GPT failed to abstain appropriately under poor retrieval. This reframes the agentic case: the value of an agent is partly its ability to recognise insufficient context and retrieve again rather than confabulate. Sources: Google Research Blog / ICLR 2025 (2025) (↗)
The compounding-failure argument is the strongest case for bounded loops. METR's analysis quantifies the chaining problem: a 95%-reliable step yields only 36% end-to-end success across twenty sequential steps. Combined with the seven-month doubling time in task-completion horizon from the March 2025 time-horizon paper (confirmed in the January 2026 1.1 update), this gives practitioners a calibrated design heuristic: keep agentic loops short, instrument heavily, route simple queries to static pipelines. Sources: METR (2025) (↗); METR (2026) (↗)
GraphRAG is conditionally, not generally, useful. The ICLR 2026 GraphRAG-Bench work shows GraphRAG achieving 13.4% lower accuracy than vanilla RAG on Natural Questions and 2.3x higher latency on multi-hop tasks, while improving HotpotQA multi-hop reasoning by 4.5%. Min et al. document a 15% enterprise improvement on SAP legacy-code migration. The decision criterion that emerges across lanes: graph retrieval earns its 3-5x construction cost only for entity-relationship traversal over stable corpora where provenance and audit trails matter. Sources: arXiv / ICLR 2026 (2025) (↗); arXiv (cs.IR) (2025) (↗)
The framework era is consolidating, not expanding. Multiple practitioner sources converge on a two-framework production pattern: LlamaIndex for retrieval, indexing, and chunking, LangGraph (1.0-stable since October 2025) for stateful agent control flow, with a clean tool-interface boundary between them. LangChain has pivoted away from general orchestration toward RAG tooling, while Microsoft merged AutoGen and Semantic Kernel. MindStudio argues the heavyweight-framework era is ending as MCP standardises integration and coding agents generate custom pipelines on demand. Sources: MarsDevs (2026) (↗); MindStudio (2026) (↗); Medium (Trung Hiếu Trần) (2025) (↗)
The data-warehousing analogy is real at the governance layer and breaks at the query layer. Across all five lanes, the parallel holds: context catalogues classify retrievable knowledge by type, provenance, authority, and temporal validity, much as dimension tables enrich facts, and both require metadata management, lineage, and freshness policies. It breaks because star schemas join deterministically while embedding retrieval is probabilistic, and because agentic RAG does not know its join paths in advance. The semantic-layer convergence around Snowflake Semantic Views, Databricks Metric Views, and dbt MetricFlow under Open Semantic Interchange v1.0 is the nearest thing to dimensional modelling's standardising role. Sources: Thoughtworks Technology Radar (2025) (↗); Towards Data Science (2025) (↗)
"Context Engineering" is the term the field landed on. Thoughtworks Radar Vol. 33 (November 2025), framed by CTO Rachel Laycock as a "step change", shifted its central theme from RAG to agents, MCP, and context engineering. RAGFlow's year-end review independently describes RAG metamorphosing from a retrieval pattern into a "Context Engine", drawing an explicit parallel between the ingestion pipeline (Parse-Transform-Index) and the ETL/ELT tooling that industrialised data warehousing. Two independent sources naming the same conceptual successor is a strong convergence signal. Sources: Thoughtworks Technology Radar (2025) (↗); RAGFlow (2025) (↗)
The accuracy gap on hard queries is categorical, not marginal. A 2025 MDPI study across 250 clinical vignettes recorded a 55-percentage-point multi-hop accuracy gap: 34% for static RAG versus 89% for agentic RAG. NStarX reports 25-40% reductions in irrelevant retrievals from agentic approaches. This is the evidence that justifies the token premium for high-stakes, multi-hop work, even as it confirms static RAG remains correct for single-hop tasks. Sources: Medium (2026) (↗); NStarX Inc. (2025) (↗)
Evidence & Data
The cost numbers cluster consistently. MarsDevs documents the 3-10x token multiplier and 8-12 second loop latency, with single-pass RAG at 1-2 seconds. Redis quantifies semantic caching recovering up to 73% of cost in high-repetition workloads. The hidden-economics literature describes an "Unreliability Tax" where an orchestrator-worker flow with a reflexion loop runs 10-30 seconds against 800 milliseconds for a single call. Sources: MarsDevs (2026) (↗); Redis (2026) (↗); Development Corporate (2026) (↗)
Framework efficiency is measurable. AIMultiple's January 2026 benchmark found a 53% token-count spread between frameworks: Haystack at 1.57k tokens versus LangChain at 2.40k, a gap that compounds at scale. HiPRAG cut the over-search rate to 2.3% via hierarchical process rewards, putting a number on the central inefficiency of agentic loops. Sources: AIMultiple (2026) (↗); arXiv (cs.CL) (2025) (↗)
On adoption and market structure: McKinsey's November 2025 survey (1,993 participants, 105 countries) found 23% of enterprises scaling agentic AI in at least one function, with high performers three times more likely to be scaling per function. Gartner's April 2026 Agentic AI Hype Cycle reports only 17% of organisations have deployed agents despite 60%+ expecting to within two years, and introduces context graphs and FinOps for agentic AI as named profiles. CB Insights tracked private agent companies from roughly 300 in March 2025 to over 400 by November 2025, with one in five new unicorns building agents. ICONIQ's January 2026 survey of around 300 software executives found model inference rising from 20% to 23% of product cost, with 2026 gross margins projected at 52%. Sources: McKinsey Global Institute (2025) (↗); Gartner (2026) (↗); CB Insights (2025) (↗); ICONIQ Capital (2026) (↗)
Production targets are now quantified. MarsDevs gives faithfulness ≥0.9, answer relevancy ≥0.85, context precision ≥0.8. Ragas reports 400k+ monthly downloads. The most aggressive single claim is VentureBeat's May 2026 report of Pinecone's internal benchmark showing 98% token reduction by moving reasoning from inference time to a compilation stage, which Gartner's Arun Chandrasekaran frames as embedding structural logic into the metadata layer. Sources: MarsDevs (2026) (↗); VentureBeat (2026) (↗)
The cost-threshold work is unusually concrete. SitePoint's long-context versus RAG calculator places the crossover at below 200K tokens with under 500 daily queries (long context with caching wins) versus above 500K tokens at 5,000+ queries per day (RAG wins). Sources: SitePoint (2026) (↗)
Signals & Tensions
Is RAG dying or layering? The blogs split. The "Is RAG Dead?" framing and VentureBeat's "RAG era is ending" headline push displacement; the more careful independent reading is convergence into a layered architecture where agents orchestrate and RAG grounds. The weight of evidence favours layering: most production systems run both, routing simple queries to static pipelines and complex tasks to agentic loops. Sources: Towards Data Science (Medium) (2025) (↗); VentureBeat (2026) (↗); Micheal Lanham (Substack) (2026) (↗)
Benchmarks overstate real-world capability. METR's August 2025 update warned that algorithmic scoring on suites like SWE-Bench Verified likely overstates real-world performance substantially, and its July 2025 productivity study found developers using early-2025 tools took 19% longer. This is the most underreported tension: capability-curve optimism and field-deployment reliability diverge sharply, and the divergence applies directly to RAGAS-style evaluation. Sources: METR (2025) (↗); METR (2026) (↗)
Long-context windows were supposed to kill RAG. They did not. The lost-in-the-middle degradation persists at long context, and the cost economics make stuffing a 1M-token window per message prohibitive at scale. Long context complements RAG below the threshold; it does not replace it. Sources: Agam Jain (personal blog) (2025) (↗); arXiv (cs.AI) (2025) (↗)
Knowledge graphs as semantic backbone is a VC thesis ahead of the evidence. A16z's data-entropy argument and the Salesforce-Informatica ($8bn) and ServiceNow-data.world acquisitions signal market consolidation around structured context. The academic record is more sceptical: GraphRAG underperforms vanilla RAG on simple questions, costs 3-5x to build, and shows entity-recognition accuracy of 60-85% depending on domain. The market is betting ahead of the benchmarks. Sources: Andreessen Horowitz (a16z) (2025) (↗); Towards Data Science (2025) (↗); arXiv / ICLR 2026 (2025) (↗)
Evaluation is the underbuilt layer everyone names. NStarX reports 70% of RAG systems still lack systematic evaluation frameworks. The structural problem is trajectory evaluation: agentic correctness no longer decomposes into retrieval quality plus generation quality, which is why OpenTelemetry-based tools (Arize Phoenix, Langfuse) are gaining over LangSmith for heterogeneous stacks. Sources: NStarX Inc. (2025) (↗); arXiv (cs.CL) (2025) (↗)
Open Questions
How large are switching costs really? The academic literature does not yet model the cost of migrating production RAG (retrieval logic in app code, chunking in index scripts, metrics in CI) to an agent-first architecture quantitatively. The Mastercard fintech case offers only qualitative evidence of integration effort. Sources: arXiv (cs.AI) (2025) (↗)
Does the compilation-stage knowledge layer generalise, or is Pinecone's 98% reduction workload-specific? A single vendor benchmark is not yet independent evidence. Sources: VentureBeat (2026) (↗)
Can hallucination attribution become operational? FACTUM traces citation hallucination to transformer pathway alignment failures and TPA extends attribution beyond the FFN-versus-context model, but neither yet plugs into production observability. Sources: arXiv (cs.CL) (2026) (↗); arXiv (cs.CL) (2025) (↗)
Will MCP's security model hold? The prompt-injection, tool-permission-chaining, and lookalike-tool risks are structurally analogous to SQL injection and confused-deputy attacks, implying a multi-year hardening cycle that has barely started. Sources: AI Realized Now (Substack) (2026) (↗)
What thresholds actually trigger a switch to pure agentic workflows? Query complexity, knowledge-graph depth, and tool diversity are named as signals, but no source offers a validated decision function. The SitePoint cost crossover is the closest, and it covers only long-context versus RAG, not static versus agentic. Sources: Micheal Lanham (Substack) (2026) (↗); SitePoint (2026) (↗)
Does the seven-month doubling hold long enough to matter? If it does, day-length retrieval-planning tasks become viable within a year. If benchmark-to-field divergence widens faster, the operational reality lags the curve indefinitely. The honest answer in May 2026 is that the layered architecture wins precisely because nobody yet trusts the pure agentic case to be reliable enough to bet a regulated workload on it.
![[sources-agentic-rag-between-november-2025-and-may-2026-how]]
Sources
Summary: ↑ Back to summary
Academic & arXiv
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| a1 | Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | arXiv (cs.AI) | 2025-01 | Foundational survey introducing a principled taxonomy of agentic RAG architectures by agent cardinality, control structure, autonomy, and knowledge representation; revised through April 2026. |
| a2 | SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions | arXiv (cs.IR) | 2026-03 | First systematisation of knowledge paper to formalise agentic RAG as a finite-horizon partially observable Markov decision process, addressing fragmented architectures and inconsistent evaluation methodologies. |
| a3 | A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces | arXiv (cs.CL) | 2026-02 | Demonstrates that a truly agentic framework exposing hierarchical keyword, semantic, and chunk-read tools outperforms static RAG baselines with comparable or lower token consumption across open-domain QA benchmarks. |
| a4 | HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation | arXiv (cs.CL) | 2025-10 | Introduces a reinforcement-learning training method with fine-grained process rewards that reduces over-search to 2.3% on seven QA benchmarks, directly quantifying the token-burn problem in agentic retrieval loops. |
| a5 | Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges | arXiv (cs.AI) | 2025-06 | Maps agentic RAG reasoning paradigms onto dual-process cognitive theory and explicitly identifies the lost-in-the-middle problem and context management failures at scale as central industrial challenges. |
| a6 | MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning | arXiv (cs.CL) | 2025-05 | Training-free multi-agent framework with specialised Planner, Step Definer, Extractor, and QA agents; sets state-of-the-art on multi-hop QA and shows LLaMA3-8B with MA-RAG surpassing larger standalone models. |
| a7 | RAG vs. GraphRAG: A Systematic Evaluation and Key Insights | arXiv (cs.IR) | 2025-02 | Systematic empirical comparison establishing when graph-structured retrieval offers measurable gains versus vanilla RAG and characterising graph construction cost and latency trade-offs. |
| a8 | When to Use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation | arXiv / ICLR 2026 | 2025-06 | Introduces GraphRAG-Bench and shows GraphRAG achieves 13.4% lower accuracy than vanilla RAG on factual queries but improves multi-hop reasoning by 4.5%, at 2.3x higher latency - a key decision-criteria paper. |
| a9 | Towards Practical GraphRAG: Efficient Knowledge Graph Construction and Hybrid Retrieval at Scale | arXiv (cs.IR) | 2025-07 | Proposes a cost-efficient enterprise GraphRAG pipeline fusing vector similarity with graph traversal via Reciprocal Rank Fusion, validating 15% improvement over vector baselines on legacy code migration datasets. |
| a10 | StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering | arXiv (cs.CL) | 2025-10 | Combines query decomposition with BFS-based knowledge graph traversal to assemble explicit evidence chains, achieving state-of-the-art on MuSiQue, 2WikiMultiHopQA, and HotpotQA. |
| a11 | Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications | arXiv (cs.AI) | 2025-07 | Real-world deployment case using a multi-tool LLM agent over a knowledge graph of INRAE publications, showing agentic architectures enable exhaustive dataset queries impossible with static RAG. |
| a12 | Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems | arXiv (cs.CL) | 2025-10 | Application-oriented survey covering retrieval granularity trade-offs, context contamination, and hallucination mitigation strategies across static and agentic RAG pipelines. |
| a13 | FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG | arXiv (cs.CL) | 2026-01 | Mechanistic analysis linking citation hallucination to internal transformer pathway dynamics, providing interpretable diagnostics for long-form agentic RAG outputs. |
| a14 | TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG | arXiv (cs.CL) | 2025-12 | Decomposes final token probability across transformer residual-stream components to detect hallucination in RAG, extending beyond the binary FFN-versus-context conflict model. |
| a15 | M-RAG: Making RAG Faster, Stronger, and More Efficient | arXiv (cs.IR) | 2026-03 | Proposes a chunk-free retrieval strategy addressing how fixed chunking disrupts contextual integrity and limits reasoning over causal and hierarchical document relationships. |
| a16 | Ragas: Automated Evaluation of Retrieval Augmented Generation | arXiv / EACL 2024 | 2023-09 | Foundational evaluation framework providing reference-free metrics for faithfulness, answer relevance, and context relevance; remains the dominant RAG evaluation standard against which agentic pipeline tools are benchmarked. |
| a17 | RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG | arXiv (cs.CL) | 2025-11 | Introduces an agentic QA-dataset generation pipeline with filtering and optimised LLM-as-Judge metrics, demonstrating consistent outperformance of RAGAS across domain-specific evaluation tasks. |
| a18 | RAG for Fintech: Agentic Design and Evaluation | arXiv (cs.AI) | 2025-10 | Enterprise deployment study at Mastercard documenting agentic RAG pipeline design for fintech knowledge bases, including modular agents for query reformulation, acronym resolution, and iterative sub-query decomposition. |
| a19 | A Survey of RAG-Reasoning Systems in LLMs | arXiv (cs.CL) | 2025-07 | Taxonomises recent advances in retrieval-reasoning integration including in-context retrieval, chain-of-thought interleaving, and multi-agent orchestration patterns such as HM-RAG and Chain of Agents. |
| a20 | Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding | arXiv (cs.CV) | 2025-10 | Documents how current multimodal RAG benchmarks require 20–200 million visual tokens, far exceeding LLM context limits, motivating agent-driven iterative retrieval for long-document understanding. |
| a21 | HCAST: Human-Calibrated Autonomy Software Tasks | METR | 2025 | METR's 189-task benchmark across ML, cybersecurity, software engineering, and general reasoning with 563 human expert attempts, establishing calibrated time-horizon metrics for evaluating autonomous agents including agentic RAG systems. |
| a22 | RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents Against Human Experts | arXiv / METR | 2024-11 | METR benchmark comparing Claude 3.5 Sonnet and o1-preview against 71 human ML experts on research engineering tasks; finds agents achieve 4x human performance at 2-hour budget but humans outperform 2x at 32 hours. |
| a23 | Measuring AI Ability to Complete Long Tasks (METR Time Horizons) | METR | 2025-03 | METR's empirical analysis showing AI agent task-completion time horizons doubling every seven months, providing the scaling context within which agentic RAG capability growth should be interpreted. |
| a24 | Research Update: Algorithmic vs. Holistic Evaluation | METR | 2025-08 | Shows frontier model success rates on SWE-Bench Verified (~70–75%) likely overestimate real-world performance due to algorithmic scoring gaps, a methodological warning directly applicable to agentic RAG pipeline evaluation. |
| a25 | The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation | arXiv (cs.CV) | 2026-05 | Identifies 'recorruption' - where accurate external context causes a capable model to abandon a previously correct prediction - formalising a failure mode specific to RAG context injection. |
Frontier Lab & Model News
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| t1 | Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | arXiv (v4 updated April 2026) | 2025-01 | The definitive survey paper on agentic RAG, introducing a principled taxonomy of architectures based on agent cardinality, control structure, autonomy, and knowledge representation, with an April 2026 update. |
| t2 | How we built our multi-agent research system | Anthropic Engineering Blog | 2025 | Anthropic's detailed engineering account of how their multi-agent research system replaced static RAG with multi-step, lead-agent plus subagent architecture, documenting context-overflow mitigations and task-description requirements. |
| t3 | Building effective agents | Anthropic Research | 2024-12 | Anthropic's foundational guidance on when to use agentic systems versus simpler retrieval-augmented LLM calls, with explicit caution about unnecessary complexity. |
| t4 | Introducing the Model Context Protocol | Anthropic News | 2024-11 | Announcement of MCP as the open standard for connecting AI agents to external data sources, directly enabling agent-driven retrieval across heterogeneous tool ecosystems. |
| t5 | Donating the Model Context Protocol and establishing the Agentic AI Foundation | Anthropic News | 2025-12 | Anthropic's transfer of MCP governance to the Linux Foundation's Agentic AI Foundation, cementing MCP as vendor-neutral infrastructure for agentic retrieval pipelines and reporting 97M+ monthly SDK downloads. |
| t6 | MCP joins the Agentic AI Foundation | Model Context Protocol Blog | 2025-12 | Official MCP blog post documenting the protocol's growth to 10,000 active servers and first-class support across ChatGPT, Claude, Gemini, and Microsoft Copilot. |
| t7 | Measuring AI Ability to Complete Long Tasks | METR | 2025-03 | METR's foundational paper establishing the time-horizon metric, showing frontier agentic task completion doubling every seven months - the key quantitative frame for measuring the long-horizon capability underpinning agentic RAG use cases. |
| t8 | Time Horizon 1.1 | METR | 2026-01 | METR's updated methodology with an expanded task suite, confirming the seven-month doubling time and adding evaluations for GPT-5.1 and Gemini 3 Pro relevant to agentic workload planning. |
| t9 | Task-Completion Time Horizons of Frontier AI Models | METR | 2026-05 | Living leaderboard tracking autonomous task horizons across all major frontier models, including Claude Opus 4.5, GPT-5.1, and Gemini 3 Pro, with the latest data point added May 2026 for Claude Mythos Preview. |
| t10 | Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity | METR | 2025-07 | Controlled study finding developers using AI tools in early 2025 took 19% longer than without, providing a sceptical counterpoint to benchmark-driven optimism about agentic workflow productivity gains. |
| t11 | Frontier AI Trends Report | UK AI Security Institute (AISI) | 2025 | AISI's analysis of MCP server autonomy levels and growing cyber-task horizons, documenting that finance-focused MCP servers increasingly grant higher autonomy and that cyber task completion times doubled in roughly eight months. |
| t12 | Deeper Insights into Retrieval Augmented Generation: The Role of Sufficient Context | Google Research Blog / ICLR 2025 | 2025 | Google Research paper showing that insufficient retrieval context paradoxically increases hallucination, with Gemma's error rate rising from 10% to 66% under poor retrieval - critical evidence for context-quality requirements in agentic RAG. |
| t13 | Building with Gemini Embedding 2: Agentic Multimodal RAG and Beyond | Google Developers Blog | 2025-04 | Google DeepMind's announcement of a unified multimodal embedding model spanning text, images, video, and audio in a single vector space, enabling agentic RAG pipelines that retrieve across modalities. |
| t14 | RAG and Grounding on Vertex AI | Google Cloud Blog | 2024-06 | Google's technical announcement of dynamic retrieval in Vertex AI Agent Builder, introducing cost-balancing logic to decide when to use Google Search versus parametric knowledge - a practical model for selective retrieval in agentic systems. |
| t15 | Deep Research Max: a step change for autonomous research agents | Google Blog | 2025-04 | Google's Deep Research Max, built on Gemini 3.1 Pro, demonstrates a production agentic RAG pattern: iterative search, MCP tool integration, and multimodal grounding across custom proprietary data and the open web. |
| t16 | Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers | arXiv | 2025-05 | Comprehensive academic survey covering agent-based universal RAG, corrective RAG, and graph-based retrieval, including quantitative findings such as Dual-Pathway KG-RAG reducing hallucinations by 18% in biomedical QA. |
| t17 | Agentic Artificial Intelligence: Architectures, Taxonomies, and Evaluation of Large Language Model Agents | arXiv | 2026-01 | Technical survey situating RAG as the persistent-memory layer within broader agent architectures, covering Anthropic computer-use tooling and OpenAI Operator, with references to OSWorld and SWE-bench evaluation infrastructure. |
| t18 | Introducing GPT-5.5 | OpenAI | 2025-04 | OpenAI's announcement claiming GPT-5.5 completes agentic Codex tasks with significantly fewer tokens than prior models, directly relevant to the token-efficiency dimension of agentic RAG cost modelling. |
| t19 | OpenAI and Anthropic Donate AGENTS.md and Model Context Protocol to New Agentic AI Foundation | InfoQ | 2025-12 | Authoritative industry coverage of the AAIF formation, documenting Google's parallel A2A protocol donation and the convergence of competing labs on open agent-interoperability standards. |
| t20 | Anthropic launches enterprise 'Agent Skills' and opens the standard | VentureBeat | 2025-12 | Reports Anthropic's Agent Skills open standard, demonstrating that OpenAI adopted structurally identical architecture in ChatGPT and Codex CLI, illustrating rapid cross-lab convergence on reusable workflow knowledge for agentic retrieval. |
| t21 | OpenAI's Agents SDK and Anthropic's Model Context Protocol (MCP) | PromptHub | 2025-03 | Technical comparison of OpenAI's Agents SDK (with built-in file search against vector stores) and Anthropic's MCP, covering the complementary roles of agentic orchestration and retrieval-connectivity layers. |
| t22 | A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges | ACL Anthology / IJCNLP 2025 Findings | 2025 | Peer-reviewed survey aligning RAG paradigms with System 1 / System 2 cognitive frameworks and cataloguing reasoning workflows including ReAct, SELF-RAG, and multi-hop decomposition in industry settings. |
| t23 | Retrieval-Augmented Generation in Late 2025: a practical insight | Medium | 2025-10 | Practitioner synthesis arguing that long-context models and search APIs can replace static RAG for many queries, framing the 'start with search, reach for RAG only when data volume demands it' contrarian decision criterion. |
| t24 | AI Agent Landscape 2025–2026: A Technical Deep Dive | Medium | 2026-01 | Technical overview documenting Anthropic's multi-agent researcher using a Memory tool to persist plans beyond the 200K token limit, and reporting that tool selection via semantic similarity improves accuracy 3x versus presenting all tools simultaneously. |
| t25 | New AI Model Releases News (April 2026 Startup Edition) | mean.ceo blog | 2026-04 | Documents the April 2026 model release wave, noting that every major 2026 release emphasises agentic capabilities and that MCP crossed 97 million installs in March 2026, marking its transition to foundational agentic infrastructure. |
Tech Industry & Practitioner
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| p1 | Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery | InfoQ | 2026-04 | Field report from three financial-services deployments (Q4 2025, n=~1,500 multi-hop queries) showing 30% silent-failure rate and finding that ~60% of hallucinations originated from unhandled execution errors rather than model reasoning, directly grounding operational failure-mode claims. |
| p2 | Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (arXiv 2501.09136 v4) | arXiv (Aditi Singh et al.) | 2026-04 | The most cited practitioner-adjacent survey of agentic RAG architectures; introduces a principled taxonomy by agent cardinality, control structure, and autonomy, with comparative trade-off analysis across healthcare, finance, and enterprise document processing use cases. |
| p3 | Thoughtworks Technology Radar Volume 33 - Themes: Rise of Agents Elevated by MCP, Context Engineering, AI Antipatterns | Thoughtworks Technology Radar | 2025-11 | Authoritative practitioner signal that RAG dominated Volume 32 conversation while Volume 33 shifted to agents and MCP, confirming the industry inflection from static retrieval to agentic workflows as observed by Thoughtworks CTO Rachel Laycock. |
| p4 | Thoughtworks Technology Radar Highlights The Rapid Evolution of AI Assistance in 2025 (Vol. 33 press release) | Thoughtworks / PR Newswire | 2025-11 | Official release statement confirming Volume 33's shift from RAG and prompt engineering (Vol. 32) to context engineering, MCP, and agentic systems, citing the growth of agentic workflows and enterprise AI antipatterns as the dominant themes. |
| p5 | Thoughtworks Technology Radar - Techniques: Semantic Layer for AI (LLM text-to-SQL, dbt MetricFlow, Snowflake Semantic Views) | Thoughtworks Technology Radar | 2025-11 | Practitioner evidence that semantic layers - the closest analogue to dimensional modelling in agentic systems - are now a first-class concern, with Thoughtworks warning that naive LLM text-to-SQL produces incorrect results when business rules live outside the schema. |
| p6 | Thoughtworks Technology Radar Volume 32 - Supervised Agents, RAG Techniques, Data Product Thinking | Thoughtworks | 2025-04 | Volume 32 spotlighted corrective RAG, Fusion-RAG, Self-RAG, and FastGraphRAG as Trial-level techniques, and introduced 'data product thinking' as the data management analogue of product management - directly relevant to context catalogue design. |
| p7 | Thoughtworks Technology Radar - Platforms: Graphiti, Databricks Agent Bricks, Rhesis testing | Thoughtworks Technology Radar | 2026-04 | Practitioner-assessed platform blips including Graphiti (temporal knowledge graph for LLM memory) and Databricks Agent Bricks; explicitly flags that flat vector stores in RAG pipelines fail to track how facts change over time. |
| p8 | Themes from Technology Radar Vol. 33 - Podcast: Infrastructure Automation, Rise of Agents, MCP, AI Antipatterns | Thoughtworks | 2025-11 | Neal Ford and Ken Mugrage explain the editorial reasoning behind Volume 33's shift from RAG to agents and MCP, including the concept of context engineering - 'how do you tell the agents what they're supposed to do and give them roles.' |
| p9 | RAG in 2026: The UK/EU Enterprise Guide to Grounded GenAI | Data Nucleus | 2026-01 | Practitioner guide situating EU AI Act and GDPR obligations alongside agentic RAG architecture guidance, covering framework selection (LangGraph, LlamaIndex, AutoGen, CrewAI), access control patterns, and ReAct/Tree-of-Thoughts retrieval reasoning. |
| p10 | Agentic RAG: The 2026 Production Guide | MarsDevs | 2026-05 | Production-focused guide with quantified latency benchmarks (standard RAG 1–2 s; agentic loop 8–12 s; 3–10x token cost multiplier) and three-layer evaluation architecture using Ragas, Arize Phoenix, and Langfuse - the most numerically grounded cost/latency source in the sweep. |
| p11 | Next-Generation Agentic RAG with LangGraph (2026 Edition) | Medium (Vinod Rane) | 2026-03 | Detailed implementation guide for stateful agentic RAG using LangGraph directed cyclic graphs, with per-node RAGAS observability instrumentation (critic_score, retrieval_round, iteration_count, token_budget_used) and production metric targets. |
| p12 | RAG Framework Benchmark: LangChain vs LangGraph vs LlamaIndex vs Haystack vs DSPy | AIMultiple | 2026-01 | Standardised 100-query benchmark across five frameworks with identical models (GPT-4.1-mini) and retriever (Qdrant), isolating framework overhead and token efficiency: DSPy 3.53 ms overhead; Haystack 1.57k tokens vs LangChain 2.40k tokens - a 53% token difference that compounds at scale. |
| p13 | LangChain vs LlamaIndex (2026): Complete Production RAG Comparison | PremAI Blog | 2026-03 | Documents the architectural split between LangChain/LangGraph (workflow-first, stateful graphs) and LlamaIndex (retrieval-first, data-centric agents), noting LangGraph reached 1.0 stability in October 2025 and effectively superseded original chain-based LangChain for production agentic work. |
| p14 | Why LLM Frameworks Like LangChain and LlamaIndex Are Being Replaced by Agent SDKs | MindStudio | 2026-03 | Analyses the structural disruption of heavyweight RAG frameworks by native tool-calling, expanded context windows, MCP standardisation, and agent SDKs; includes LlamaIndex co-founder Jerry Liu's public acknowledgement that the framework era is ending. |
| p15 | LLM Frameworks Compared (2026): LangChain, LlamaIndex, DSPy and More | Morph | 2026-03 | Documents framework consolidation into four categories (orchestration, agents, optimisation, code-specific), reports LangChain at 100K+ GitHub stars and 34.5 million monthly LangGraph downloads, and warns that stacking three or more frameworks signals overengineering. |
| p16 | The 5 Best RAG Evaluation Tools You Should Know in 2026 | Maxim AI | 2026-02 | Comparative review of the five dominant evaluation platforms (Maxim AI, LangSmith, Arize Phoenix, Ragas, DeepEval), noting RAGAS exceeds 400,000 monthly downloads and 20 million evaluations, and that LangSmith's tight LangChain coupling creates friction in mixed-framework environments. |
| p17 | Top RAG Evaluation Tools in 2026 | Goodeye Labs | 2026-03 | Independent ranking of seven evaluation platforms including Weights & Biases Weave and Braintrust, with evidence that a 2025 study found Microsoft Copilot gave medically incorrect advice 26% of the time - illustrating the real-world stakes of inadequate RAG evaluation. |
| p18 | 7 RAG Evaluation Tools You Must Know | Iguazio | 2025-12 | Practitioner-oriented tool guide covering Ragas, LangSmith, Arize Phoenix, TruLens, and Promptfoo for continuous RAG evaluation in CI/CD pipelines, relevant to the maturing DevOps practices around agentic AI quality assurance. |
| p19 | Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI | Towards Data Science | 2025-10 | Documents the M&A consolidation around knowledge graph and semantic layer infrastructure (ServiceNow's acquisition of data.world, Salesforce's $8bn Informatica purchase), with Gartner's May 2025 recommendation that data engineering teams adopt ontologies and knowledge graphs to support AI. |
| p20 | From RAG to Context: A 2025 Year-End Review of RAG | RAGFlow | 2025-12 | Engineering team's year-end synthesis introducing 'Context Engineering' as the successor discipline to RAG optimisation, describing the shift from tuning single retrieval algorithms to systematic design of the end-to-end retrieval–context assembly–model reasoning pipeline. |
| p21 | Why 2025's Agentic AI Boom Is a CISO's Worst Nightmare | CSO Online | 2026-02 | Reports that 72–80% of enterprise RAG implementations significantly underperform or fail within their first year, and that 51% of all enterprise AI failures in 2025 were RAG-related; also identifies the '20,000-document cliff' latency and accuracy degradation pattern. |
| p22 | LLM Token Optimization: Cut Costs and Latency in 2026 | Redis | 2026-02 | Vendor-authored technical guide quantifying that semantic caching achieves up to 73% cost reduction in high-repetition agentic workloads, with benchmarks contrasting cache-hit millisecond response against seconds-scale fresh LLM inference. |
| p23 | Agentic RAG: When Static Retrieval Is No Longer Enough | Medium | 2026-03 | Cites MDPI Electronics 2025 study across 12 RAG variants and 250 clinical vignettes showing Self-RAG at 5.8% hallucination rate and a 55-percentage-point multi-hop accuracy gap between static RAG (34%) and agentic RAG (89%), providing the strongest quantitative case for the capability differential. |
| p24 | The Next Frontier of RAG: How Enterprise Knowledge Systems Will Evolve (2026–2030) | NStarX Inc. | 2025-12 | Engineering team's forward-looking analysis reframing RAG as a 'knowledge runtime' analogous to Kubernetes for application workloads, with governance, retrieval quality gates, and audit trails as mandatory infrastructure - the most explicit articulation of RAG-as-operational-infrastructure. |
| p25 | 10 RAG Architectures in 2026: Enterprise Use Cases and Strategy | Techment | 2026-03 | Practitioner decision framework for CTO/CDO selection across ten RAG architectures, explicitly stating that Agentic RAG is 'only necessary for complex, multi-step workflows' and that most enterprise search performs well with Hybrid RAG - the clearest published decision threshold for switching. |
Blogs & Independent Thinkers
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | Agentic RAG with LangGraph & Telegram (with Video explanation) | Jam With AI (Substack) | 2025-11 | Practitioner walkthrough contrasting static RAG pipelines against LangGraph-driven agentic RAG with query validation, document grading, and iterative query rewriting in production. |
| b2 | All you need to know about RAG (in 2026) | AI with Aish (Substack) | 2026-03 | Comprehensive 2026 state-of-the-art survey covering semantic chunking, hybrid search, cross-encoder reranking, and the operational economics of RAG versus long-context windows. |
| b3 | Comparative Analysis of RAG Architectures: Pipeline, Agentic, and Knowledge Graph (2026 Landscape) | Micheal Lanham (Substack) | 2026-02 | Cites 2026 State of AI Agents data showing 57% of organisations have deployed multi-stage agents, and frames quality as the primary production blocker with observability as table stakes. |
| b4 | How to Build an AI Agent Company in 2026: Lessons from Glean's $7.2B Playbook | Market Curve (Substack) | 2026-01 | Analyses Glean's December 2025 'Enterprise Context' platform combining memory, connectors, indexes, and personal/enterprise graphs as a case study in productising agentic RAG at scale. |
| b5 | The Eight Agentic AI Security Vulnerabilities Nobody's Talking About | AI Realized Now (Substack) | 2026-05 | Documents the CVE-2025-32711 EchoLeak exploit in Microsoft 365 Copilot's RAG pipeline and notes that prompt injection appeared in 73% of assessed production AI deployments in 2025. |
| b6 | From RAG to Context - A 2025 Year-End Review of RAG | RAGFlow Blog | 2025-12 | Authoritative year-end review arguing RAG is evolving from a retrieval pattern into a 'Context Engine', draws an explicit analogy between RAG ingestion pipelines (PTI) and ETL/ELT tooling in the structured data warehouse world. |
| b7 | Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI | Towards Data Science (Medium) | 2025-10 | Argues that knowledge graphs and metadata management are becoming the semantic backbone for agentic AI, cites Gartner's May 2025 recommendation to adopt ontologies, and traces market consolidation including Salesforce's $8 billion Informatica acquisition. |
| b8 | RAG vs Memory: Addressing Token Crisis in Agentic Tasks | Agam Jain (personal blog) | 2025-10 | First-principles analysis of context-window economics showing agentic coding sessions routinely hit 50–200K tokens and research queries can exceed 2M tokens, with a decision framework for RAG versus hierarchical memory. |
| b9 | Lessons from Implementing RAG in 2025 | TrueState Blog | 2025-11 | Practitioner post-mortem from November 2025 documenting that direct context injection fails on three axes (performance, context-window limits, and per-message cost of $1.50/million tokens) and advising RAG strategy selection based on document count and latency constraints. |
| b10 | Long Context vs RAG: When 1M Token Windows Replace RAG | SitePoint | 2026-02 | Quantitative decision framework with cost calculators showing long context wins below 200K tokens with fewer than 500 daily queries, while RAG wins above 500K tokens at 5,000-plus queries per day. |
| b11 | Agentic RAG in 2026: The UK/EU Enterprise Guide to Grounded GenAI | Data Nucleus | 2026-01 | Practitioner guide contextualising agentic RAG within EU AI Act obligations and GPAI rules, covering ReAct and Tree-of-Thoughts patterns with compliance-oriented deployment guidance. |
| b12 | Agentic RAG: The 2026 Production Guide | MarsDevs | 2026-05 | Sets concrete production targets (faithfulness ≥0.9, answer relevancy ≥0.85, context precision ≥0.8), identifies the dominant 2026 stack as LangGraph for orchestration plus LlamaIndex for retrieval, and notes agentic RAG 3–4 iteration loops take 8–12 seconds. |
| b13 | The Next Frontier of RAG: How Enterprise Knowledge Systems Will Evolve (2026–2030) | NStarX Inc. | 2025-12 | Reports that production agentic RAG deployments show a 25–40% reduction in irrelevant retrievals but introduce new failure modes including retrieval loops and over-retrieval, and notes 70% of RAG systems still lack systematic evaluation frameworks. |
| b14 | Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery | InfoQ | 2026-04 | Reports that in Q4 2025 financial services testing (n=1,500 multi-hop queries), approximately 60% of hallucinations originated from unhandled execution errors rather than LLM reasoning flaws, making error-recovery mechanisms the highest-ROI reliability investment. |
| b15 | Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (arXiv 2501.09136) | arXiv (Singh, Ehtesham et al.) | 2025-01 | The field's primary taxonomy paper, classifying agentic RAG architectures by agent cardinality, control structure, autonomy, and knowledge representation, and identifying memory management, evaluation, and governance as open research gaps. |
| b16 | RAG in 2026: Bridging Knowledge and Generative AI | Squirro Blog | 2026-04 | Enterprise practitioner view arguing that agentic workflows require a context graph of past decisions to avoid reliability degradation, with case studies including a European bank saving EUR 20 million over three years. |
| b17 | The AI Agent Framework Landscape in 2025: What Changed and What Matters | Medium (Trung Hiếu Trần) | 2025-11 | Documents Microsoft's October 2025 merger of AutoGen and Semantic Kernel into a unified agent framework, LangChain's pivot away from agent orchestration toward RAG tooling, and the consolidation of the broader framework landscape. |
| b18 | LangGraph vs LlamaIndex Workflows for Building Agents - The Final No-BS Guide (2025) | Medium (Pedro Azevedo) | 2026-01 | Engineering-oriented teardown concluding that LlamaIndex leads on mature RAG modules while LangGraph leads on stateful multi-agent orchestration, with LangGraph noted for frequent breaking changes across versions. |
| b19 | Why LLM Frameworks Like LangChain and LlamaIndex Are Being Replaced by Agent SDKs | MindStudio Blog | 2026-03 | Argues the traditional framework era is ending as native tool calling, expanded context windows, and MCP standardisation erode the case for heavyweight RAG abstractions, citing LlamaIndex co-founder Jerry Liu's public acknowledgement of this disruption. |
| b20 | RAG Frameworks: LangChain vs LangGraph vs LlamaIndex vs Haystack vs DSPy | AIMultiple | 2026-01 | Benchmarks five frameworks across 100 queries with standardised components (GPT-4.1-mini, BGE-small, Qdrant, Tavily), isolating orchestration overhead and token efficiency to provide the most rigorous public framework comparison to date. |
| b21 | Agentic AI Frameworks 2026 | Flobotics | 2025-12 | Detailed framework landscape survey positioning DSPy for LLM output optimisation in complex reasoning pipelines, PydanticAI for type-safe production systems, and LlamaIndex as the default for data-centric agentic applications. |
| b22 | Production RAG in 2026: LangChain vs LlamaIndex | Rahul Kolekar (personal blog) | 2026-01 | Practitioner post documenting the prevalent production pattern of LlamaIndex handling ingestion and retrieval while LangChain/LangGraph handles orchestration, with advice to prefer 2-step RAG before adding agentic complexity. |
| b23 | Designing Agentic Loops | Simon Willison's Weblog | 2025-09 | Simon Willison's foundational post on agentic loop design, warning that agents are inherently dangerous due to prompt injection risks and that YOLO-mode tool execution maximises productivity at the cost of safety controls. |
| b24 | Writing about Agentic Engineering Patterns | Simon Willison's Weblog | 2026-02 | Announces Willison's ongoing Agentic Engineering Patterns guide, distinguishing professional agent-assisted development from vibe coding and framing the central challenge as the near-zero cost of generating initial code versus the unchanged cost of knowing what to build. |
| b25 | RAG in 2025: The Enterprise Guide to Retrieval Augmented Generation, Graph RAG and Agentic AI | Data Nucleus | 2026-01 | Refutes the 'RAG is dead' narrative by demonstrating how enterprises blend agents for orchestration with RAG for grounding, and catalogues Graph RAG, HyDE, ColBERT reranking, and agentic patterns as complementary rather than competing approaches. |
VC & Analyst Reports
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| v1 | Leaders, Gainers and Unexpected Winners in the Enterprise AI Arms Race | Andreessen Horowitz (a16z) | 2026-02 | Quantitative enterprise survey showing reasoning models are enabling new agentic workflows, with 54% of respondents citing accelerated LLM adoption and real production data on model provider share shifts. |
| v2 | Big Ideas 2026: Part 1 - Unstructured Data, Agent-Speed Infrastructure, and Data Entropy | Andreessen Horowitz (a16z) | 2025-12 | Frames 'data entropy' as the binding constraint on RAG and agentic AI, arguing that 80% of corporate knowledge lives in unstructured formats and that downstream AI workloads break without continuous data governance. |
| v3 | State of AI: An Empirical 100 Trillion Token Study with OpenRouter | Andreessen Horowitz (a16z) | 2025-12 | Provides production-traffic evidence that agentic inference is the fastest-growing behaviour on OpenRouter, with models planning, retrieving from tools and iterating rather than responding in single prompts. |
| v4 | Big Ideas 2026: The Agentic Interface | Andreessen Horowitz (a16z) | 2025-12 | Defines the strategic thesis that interfaces are shifting from chat to action and design must become agent-readable, directly framing how retrieval and knowledge must be reorganised for agent consumption. |
| v5 | Getting Retrieval-Augmented Generation Right: Part One | Forrester Research | 2025 | Analyst report analysing key RAG challenges and introducing shared terminology to enable cross-team collaboration, forming the baseline vocabulary for enterprise RAG deployment decisions. |
| v6 | Getting Retrieval-Augmented Generation Right: Part Two | Forrester Research | 2025 | Covers best practices across RAG indexing, retrieval, generation, and agentic support, with pioneer case studies resolving real engineering challenges in production. |
| v7 | How To Get Retrieval-Augmented Generation Right (Blog) | Forrester Research | 2025-06 | Public-facing Forrester summary noting that 50% of organisations are piloting agentic AI and 24% have it in production, with RAG identified as critical infrastructure for the transition. |
| v8 | Reference Architecture Brief: Retrieval-Augmented Generation | Gartner | 2025 | Provides a blueprint for scalable generative AI development integrating LLMs with enterprise data, serving as the authoritative reference architecture for enterprise RAG implementations. |
| v9 | Gartner Hype Cycle for Artificial Intelligence 2025 | Gartner | 2025-07 | Places AI agents and AI-ready data as the two fastest-advancing technologies at the Peak of Inflated Expectations, while GenAI slides into the Trough of Disillusionment, signalling the structural pivot to agentic workflows. |
| v10 | 2026 Hype Cycle for Agentic AI | Gartner | 2026-04 | Dedicated agentic AI Hype Cycle showing that only 17% of organisations have deployed AI agents despite 60%+ intending to within two years, with governance, FinOps for agentic AI, and context graphs as emerging profiles on the curve. |
| v11 | The State of AI in 2025: Agents, Innovation, and Transformation | McKinsey Global Institute | 2025-11 | Survey of 1,993 participants showing 23% of enterprises are scaling agentic AI in at least one function, with knowledge management identified as a leading adoption domain and high performers 3x more likely to have scaled agents. |
| v12 | AI in the Workplace: A Report for 2025 (Superagency) | McKinsey Global Institute | 2025-01 | Frames the five big AI innovations for business including agentic AI, and illustrates the shift from copilot-style RAG to agents that plan, retrieve, and execute multi-step workflows in live enterprise systems. |
| v13 | The AI Agent Market Map - November 2025 Edition | CB Insights | 2025-11 | Maps 400+ private AI agent companies across 26 categories, noting that 1 in 5 new unicorns are building agents and agentic solutions have become leading acquisition targets for enterprise software incumbents. |
| v14 | The AI Agent Tech Stack | CB Insights | 2025-10 | Maps 135+ startups across 17 infrastructure markets including retrieval, memory, orchestration, and observability, identifying reliability as the central challenge driving investment in evaluation and governance tooling. |
| v15 | The State of AI 2025 - Bessemer Venture Partners | Bessemer Venture Partners | 2025-08 | Predicts that 2025 to 2026 marks the turning point for private, grounded evaluation frameworks, and that enterprise deployment will scale tenfold once trust in AI outputs is established through reproducible, use-case-specific evals. |
| v16 | AI Infrastructure Roadmap: Five Frontiers for 2026 | Bessemer Venture Partners | 2026-03 | Sets out five infrastructure frontiers for 2026 including agentic AI, providing a VC roadmap framing the stack layers that must mature for RAG-to-agent migration to complete at enterprise scale. |
| v17 | State of AI: Bi-Annual Snapshot - The Execution Era of AI | ICONIQ Capital | 2026-01 | Survey of ~300 software executives showing model inference rises from 20% to 23% of total cost as products scale, gross margins projected at 52% in 2026, and 40% of $500M+ revenue companies actively deploying agents. |
| v18 | State of AI 2025: The Builder's Playbook - ICONIQ Capital | ICONIQ Capital | 2025-06 | Shows RAG and fine-tuning as the dominant model training techniques at 66-69% usage each, with nearly 80% of AI-native builders investing in agentic workflows as their primary product type. |
| v19 | Agentic Enablers: Treating AI's Amnesia and Other Disorders | MMC Ventures | 2025-11 | VC research report framing context and memory management as the core technical barrier to reliable agentic AI, distinguishing between context (working memory) and persistent memory, and analysing knowledge graphs as a complement to RAG for multi-hop reasoning. |
| v20 | Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (arXiv 2501.09136) | arXiv (preprint) | 2025-01 | The most comprehensive academic survey of agentic RAG architectures, introducing a principled taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation, updated through April 2026. |
| v21 | The RAG Era Is Ending for Agentic AI - A New Compilation-Stage Knowledge Layer Is What Comes Next | VentureBeat | 2026-05 | Reports Pinecone's internal benchmark showing a 98% token reduction for a financial analysis task using compilation-stage knowledge versus runtime RAG, with Gartner analyst Arun Chandrasekaran commenting that architectural compilation embeds structural logic into the metadata layer. |
| v22 | Agentic AI Applications in Vector Database Market - Size, Share and Forecast to 2030 | Mordor Intelligence | 2025-11 | Sizes the agentic AI vector database market at $0.46 billion in 2025 growing to $1.45 billion by 2030 at 25.97% CAGR, with autonomous agents projected to grow at 61.5% CAGR, outpacing conversational AI and RAG. |
| v23 | Agentic RAG: The 2026 Production Guide | MarsDevs | 2026-05 | Practitioner production guide documenting that agentic RAG costs 3-10x more in tokens at runtime versus static RAG, and that MCP became the de facto standard retrieval-tool surface after Anthropic donated it to the Linux Foundation in December 2025. |
| v24 | Gartner Hype Cycle Identifies Top AI Innovations in 2025 (Press Release) | Gartner | 2025-08 | Official press release confirming AI agents and AI-ready data as the two fastest-advancing technologies at the Peak of Inflated Expectations, with Gartner analyst Haritha Khandabattu warning that no AI agent can be used in every case. |
| v25 | Why RAG Is Failing Agentic AI | Development Corporate | 2026-05 | Synthesises Pinecone, Qdrant, and LlamaIndex CEO statements to document that 85% of agent compute goes to re-discovery rather than task completion, and that enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter. |