Research · Tech Industry & Practitioner
Back to sweepResearch sweep · deep · 2025 – 2026
Code Intelligence & Code-Graph Indexing for AI Agents
Tools and emerging approaches for code intelligence and code-graph indexing for AI coding agents from June 2025 through early June 2026, spanning local/embedded indexers (CodeGraph/Caveman-style repo maps, tree-sitter, SQLite and embedded graph stores), enterprise-scale code understanding (SCIP, code knowledge graphs, embeddings+retrieval), LSP-to-MCP bridges such as Serena, and the semantic-vs-syntactic-vs-embedding trade-off.
- GPT-5.5
- tech
- frontier
- academic
- financial
- blogs
Synthesised 2026-06-03
Narrative
{ "what_is_gaining_traction": "The clearest 2025-early-2026 trend is a shift from raw grep/read workflows toward structured context layers: MCP as the transport, SCIP and language-server semantics for precise navigation, and tree-sitter/embedded stores for local or repo-scoped maps. Thoughtworks’ 2025-2026 Radar volumes and Sourcegraph’s 2026 material both point to context engineering as the center of gravity. (thoughtworks.com)", "local_vs_enterprise": "Local/embedded tools are converging on SQLite, tree-sitter, and lightweight graph stores for small-to-mid repos, while enterprise platforms emphasize SCIP, code graphs, and federated retrieval across repos and history. Sourcegraph explicitly positions SCIP as more precise than embeddings or text search, while Kuzu-style systems show how graph+vector hybrids are being used for agent memory and retrieval. (sourcegraph.com)", "semantic_syntactic_embedding_tradeoff": "Semantic tooling wins when exact symbol navigation, references, or refactors matter; syntactic/tree-sitter wins for fast local parsing and repo maps; embeddings win for approximate recall and tool discovery, but are less precise. The strongest 2025-2026 sources frame the winning pattern as hybrid rather than single-modality. (claude.com)", "mcp_bridges": "MCP-to-code-intelligence bridges are becoming the dominant integration pattern: GitHub MCP Server, Sourcegraph MCP, Serena, agent-lsp, and registry efforts all aim to expose code context as callable tools rather than as static files. The maturity varies widely: Sourcegraph and GitHub are production-facing; Serena and agent-lsp are promising but still evolving; several newer tools are best treated as emerging or experimental. (infoq.com)", "evidence_quality": "The most grounded evidence comes from Sourcegraph’s internal cost modeling, Thoughtworks’ practitioner Radar assessments, and systems papers that include implementation details and benchmarks. By contrast, many local code-graph tools are still product pages, GitHub READMEs, or community projects with anecdotal claims rather than large-scale measurements. (sourcegraph.com)", "outlook": "The field is heading toward hybrid context stacks: build-aware or LSP-aware semantic retrieval, tree-sitter-based local indexing, graph/vector retrieval for cross-repo memory, and MCP as the agent-facing protocol layer. The practical selection criterion is likely to be repo size and precision needs: small local repos can favor embedded indexers, while monorepos and enterprise estates will continue to need SCIP- or knowledge-graph-backed infrastructure. (thoughtworks.com)" }
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| p1 | Autonomous background coding agents; framing the agent workflow landscape and the need for code-context tools. | Martin Fowler / Thoughtworks | 2025-06-04 | Autonomous background coding agents; framing the agent workflow landscape and the need for code-context tools. |
| p2 | Asynchronous coding agent embedded in GitHub Copilot and VS Code; agentic DevOps loop. | GitHub Newsroom | 2025-05-19 | Asynchronous coding agent embedded in GitHub Copilot and VS Code; agentic DevOps loop. |
| p3 | GitHub MCP Server public preview; standardized access to GitHub APIs for agents. | InfoQ | 2025-04-29 | GitHub MCP Server public preview; standardized access to GitHub APIs for agents. |
| p4 | MCP as an emerging protocol for supplying agent context and tools. | Thoughtworks Technology Radar Vol. 32 | 2025-04 | MCP as an emerging protocol for supplying agent context and tools. |
| p5 | MCP, agentic systems, AI coding workflows, and AI antipatterns. | Thoughtworks Technology Radar Vol. 33 | 2025-11-05 | MCP, agentic systems, AI coding workflows, and AI antipatterns. |
| p6 | Code intelligence as agentic tooling; toxic flow analysis for AI; MCP by default. | Thoughtworks Technology Radar Vol. 34 | 2026-04 | Code intelligence as agentic tooling; toxic flow analysis for AI; MCP by default. |
| p7 | MCP ecosystem growth, context engineering, and Context7 for version-specific docs/code examples. | Thoughtworks blog | 2025-12-11 | MCP ecosystem growth, context engineering, and Context7 for version-specific docs/code examples. |
| p8 | AI coding workflows and emerging antipatterns; MCP elevated agent use. | Thoughtworks podcast | 2025-10-30 | AI coding workflows and emerging antipatterns; MCP elevated agent use. |
| p9 | Combining search and chat; LLMs plus a precise knowledge graph of code. | Sourcegraph blog | 2025-02-05 | Combining search and chat; LLMs plus a precise knowledge graph of code. |
| p10 | Precise code navigation via SCIP; language-agnostic indexing protocol. | Sourcegraph docs | 2025-2026 | Precise code navigation via SCIP; language-agnostic indexing protocol. |
| p11 | Comparing agents, editors, MCPs, and Sourcegraph; SCIP vs embeddings vs text search. | Sourcegraph resource page | 2026 | Comparing agents, editors, MCPs, and Sourcegraph; SCIP vs embeddings vs text search. |
| p12 | What it takes to run code intelligence in-house; build-vs-buy and operational requirements. | Sourcegraph blog | 2026-04-21 | What it takes to run code intelligence in-house; build-vs-buy and operational requirements. |
| p13 | MCP access to precise cross-repository code intelligence, search, navigation, history, Deep Search. | Sourcegraph MCP | 2026 | MCP access to precise cross-repository code intelligence, search, navigation, history, Deep Search. |
| p14 | Agentic natural-language search across codebases and Git history. | Sourcegraph Deep Search | 2026 | Agentic natural-language search across codebases and Git history. |
| p15 | Semantic code retrieval and editing via MCP; LSP-backed symbol-level tooling. | Serena / GitHub repo | 2025-07-22 and later | Semantic code retrieval and editing via MCP; LSP-backed symbol-level tooling. |
| p16 | Serena as an MCP server for semantic code retrieval and language-server-aware editing. | Anthropic Claude plugin page | 2026 | Serena as an MCP server for semantic code retrieval and language-server-aware editing. |
| p17 | Discovery and packaging of Serena as an MCP-registry-listed tool. | MCP Registry / GitHub | 2026 | Discovery and packaging of Serena as an MCP-registry-listed tool. |
| p18 | Bridge between language servers and MCP for AI agents. | agent-lsp | 2026 | Bridge between language servers and MCP for AI agents. |
| p19 | Built-in tree-sitter code intelligence and repo map in an IDE-oriented tool. | Kiro docs | 2026 | Built-in tree-sitter code intelligence and repo map in an IDE-oriented tool. |
| p20 | Local SQLite-backed code graph with tree-sitter parsers; embedded storage model. | Coograph docs | 2026 | Local SQLite-backed code graph with tree-sitter parsers; embedded storage model. |
| p21 | Local-first code intelligence using SQLite FTS5; embedded/local storage approach. | KotaDB | 2026 | Local-first code intelligence using SQLite FTS5; embedded/local storage approach. |
| p22 | Map-first code intelligence; repo-derived impact analysis and tree-sitter usage. | DEX | 2026 | Map-first code intelligence; repo-derived impact analysis and tree-sitter usage. |
| p23 | Graph RAG with vector search; agentic graph retrieval; schema-aware retrieval. | Kuzu blog | 2025-06-25 | Graph RAG with vector search; agentic graph retrieval; schema-aware retrieval. |
| p24 | Knowledge graphs as agent context; multi-index graph/vector/full-text design. | Kuzu blog | 2025-07-08 | Knowledge graphs as agent context; multi-index graph/vector/full-text design. |
| p25 | Embedded graph database with built-in vector and full-text search. | Kuzu GitHub repo | 2025-10-10 archived | Embedded graph database with built-in vector and full-text search. |