Research · Tech Industry & Practitioner

Back to sweep

Research sweep · deep · 2025 – 2026

Code Intelligence & Code-Graph Indexing for AI Agents

Tools and emerging approaches for code intelligence and code-graph indexing for AI coding agents from June 2025 through early June 2026, spanning local/embedded indexers (CodeGraph/Caveman-style repo maps, tree-sitter, SQLite and embedded graph stores), enterprise-scale code understanding (SCIP, code knowledge graphs, embeddings+retrieval), LSP-to-MCP bridges such as Serena, and the semantic-vs-syntactic-vs-embedding trade-off.

  • GPT-5.5
  • tech
  • frontier
  • academic
  • financial
  • blogs

Synthesised 2026-06-03

Narrative

{ "what_is_gaining_traction": "The clearest 2025-early-2026 trend is a shift from raw grep/read workflows toward structured context layers: MCP as the transport, SCIP and language-server semantics for precise navigation, and tree-sitter/embedded stores for local or repo-scoped maps. Thoughtworks’ 2025-2026 Radar volumes and Sourcegraph’s 2026 material both point to context engineering as the center of gravity. (thoughtworks.com)", "local_vs_enterprise": "Local/embedded tools are converging on SQLite, tree-sitter, and lightweight graph stores for small-to-mid repos, while enterprise platforms emphasize SCIP, code graphs, and federated retrieval across repos and history. Sourcegraph explicitly positions SCIP as more precise than embeddings or text search, while Kuzu-style systems show how graph+vector hybrids are being used for agent memory and retrieval. (sourcegraph.com)", "semantic_syntactic_embedding_tradeoff": "Semantic tooling wins when exact symbol navigation, references, or refactors matter; syntactic/tree-sitter wins for fast local parsing and repo maps; embeddings win for approximate recall and tool discovery, but are less precise. The strongest 2025-2026 sources frame the winning pattern as hybrid rather than single-modality. (claude.com)", "mcp_bridges": "MCP-to-code-intelligence bridges are becoming the dominant integration pattern: GitHub MCP Server, Sourcegraph MCP, Serena, agent-lsp, and registry efforts all aim to expose code context as callable tools rather than as static files. The maturity varies widely: Sourcegraph and GitHub are production-facing; Serena and agent-lsp are promising but still evolving; several newer tools are best treated as emerging or experimental. (infoq.com)", "evidence_quality": "The most grounded evidence comes from Sourcegraph’s internal cost modeling, Thoughtworks’ practitioner Radar assessments, and systems papers that include implementation details and benchmarks. By contrast, many local code-graph tools are still product pages, GitHub READMEs, or community projects with anecdotal claims rather than large-scale measurements. (sourcegraph.com)", "outlook": "The field is heading toward hybrid context stacks: build-aware or LSP-aware semantic retrieval, tree-sitter-based local indexing, graph/vector retrieval for cross-repo memory, and MCP as the agent-facing protocol layer. The practical selection criterion is likely to be repo size and precision needs: small local repos can favor embedded indexers, while monorepos and enterprise estates will continue to need SCIP- or knowledge-graph-backed infrastructure. (thoughtworks.com)" }


Sources

ID Title Outlet Date Significance
p1 Autonomous background coding agents; framing the agent workflow landscape and the need for code-context tools. Martin Fowler / Thoughtworks 2025-06-04 Autonomous background coding agents; framing the agent workflow landscape and the need for code-context tools.
p2 Asynchronous coding agent embedded in GitHub Copilot and VS Code; agentic DevOps loop. GitHub Newsroom 2025-05-19 Asynchronous coding agent embedded in GitHub Copilot and VS Code; agentic DevOps loop.
p3 GitHub MCP Server public preview; standardized access to GitHub APIs for agents. InfoQ 2025-04-29 GitHub MCP Server public preview; standardized access to GitHub APIs for agents.
p4 MCP as an emerging protocol for supplying agent context and tools. Thoughtworks Technology Radar Vol. 32 2025-04 MCP as an emerging protocol for supplying agent context and tools.
p5 MCP, agentic systems, AI coding workflows, and AI antipatterns. Thoughtworks Technology Radar Vol. 33 2025-11-05 MCP, agentic systems, AI coding workflows, and AI antipatterns.
p6 Code intelligence as agentic tooling; toxic flow analysis for AI; MCP by default. Thoughtworks Technology Radar Vol. 34 2026-04 Code intelligence as agentic tooling; toxic flow analysis for AI; MCP by default.
p7 MCP ecosystem growth, context engineering, and Context7 for version-specific docs/code examples. Thoughtworks blog 2025-12-11 MCP ecosystem growth, context engineering, and Context7 for version-specific docs/code examples.
p8 AI coding workflows and emerging antipatterns; MCP elevated agent use. Thoughtworks podcast 2025-10-30 AI coding workflows and emerging antipatterns; MCP elevated agent use.
p9 Combining search and chat; LLMs plus a precise knowledge graph of code. Sourcegraph blog 2025-02-05 Combining search and chat; LLMs plus a precise knowledge graph of code.
p10 Precise code navigation via SCIP; language-agnostic indexing protocol. Sourcegraph docs 2025-2026 Precise code navigation via SCIP; language-agnostic indexing protocol.
p11 Comparing agents, editors, MCPs, and Sourcegraph; SCIP vs embeddings vs text search. Sourcegraph resource page 2026 Comparing agents, editors, MCPs, and Sourcegraph; SCIP vs embeddings vs text search.
p12 What it takes to run code intelligence in-house; build-vs-buy and operational requirements. Sourcegraph blog 2026-04-21 What it takes to run code intelligence in-house; build-vs-buy and operational requirements.
p13 MCP access to precise cross-repository code intelligence, search, navigation, history, Deep Search. Sourcegraph MCP 2026 MCP access to precise cross-repository code intelligence, search, navigation, history, Deep Search.
p14 Agentic natural-language search across codebases and Git history. Sourcegraph Deep Search 2026 Agentic natural-language search across codebases and Git history.
p15 Semantic code retrieval and editing via MCP; LSP-backed symbol-level tooling. Serena / GitHub repo 2025-07-22 and later Semantic code retrieval and editing via MCP; LSP-backed symbol-level tooling.
p16 Serena as an MCP server for semantic code retrieval and language-server-aware editing. Anthropic Claude plugin page 2026 Serena as an MCP server for semantic code retrieval and language-server-aware editing.
p17 Discovery and packaging of Serena as an MCP-registry-listed tool. MCP Registry / GitHub 2026 Discovery and packaging of Serena as an MCP-registry-listed tool.
p18 Bridge between language servers and MCP for AI agents. agent-lsp 2026 Bridge between language servers and MCP for AI agents.
p19 Built-in tree-sitter code intelligence and repo map in an IDE-oriented tool. Kiro docs 2026 Built-in tree-sitter code intelligence and repo map in an IDE-oriented tool.
p20 Local SQLite-backed code graph with tree-sitter parsers; embedded storage model. Coograph docs 2026 Local SQLite-backed code graph with tree-sitter parsers; embedded storage model.
p21 Local-first code intelligence using SQLite FTS5; embedded/local storage approach. KotaDB 2026 Local-first code intelligence using SQLite FTS5; embedded/local storage approach.
p22 Map-first code intelligence; repo-derived impact analysis and tree-sitter usage. DEX 2026 Map-first code intelligence; repo-derived impact analysis and tree-sitter usage.
p23 Graph RAG with vector search; agentic graph retrieval; schema-aware retrieval. Kuzu blog 2025-06-25 Graph RAG with vector search; agentic graph retrieval; schema-aware retrieval.
p24 Knowledge graphs as agent context; multi-index graph/vector/full-text design. Kuzu blog 2025-07-08 Knowledge graphs as agent context; multi-index graph/vector/full-text design.
p25 Embedded graph database with built-in vector and full-text search. Kuzu GitHub repo 2025-10-10 archived Embedded graph database with built-in vector and full-text search.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.