Research · Academic & arXiv
Back to sweepResearch sweep · deep · 2025 – 2026
Code Intelligence & Code-Graph Indexing for AI Agents
Tools and emerging approaches for code intelligence and code-graph indexing for AI coding agents from June 2025 through early June 2026, spanning local/embedded indexers (CodeGraph/Caveman-style repo maps, tree-sitter, SQLite and embedded graph stores), enterprise-scale code understanding (SCIP, code knowledge graphs, embeddings+retrieval), LSP-to-MCP bridges such as Serena, and the semantic-vs-syntactic-vs-embedding trade-off.
- GPT-5.5
- tech
- frontier
- academic
- financial
- blogs
Synthesised 2026-06-03
Narrative
The 2025 to early-2026 academic frontier is converging on hybrid repository intelligence: deterministic or static-analysis-backed graph layers for symbol, dependency and navigation tasks; embeddings or sparse retrieval for broad matching; and MCP/LSP bridges for exposing those capabilities to agents. The strongest empirical theme is that purely textual grep-and-read workflows are inefficient at scale, while graph-native or graph-plus-embedding systems can cut tokens and tool calls substantially. The caveat is important: many of the newest practitioner tools are still benchmark-light and vendor-driven.
Semantic, syntactic and embedding methods are not substitutes for one another. Semantic systems are best for symbol navigation, call chains, type-aware lookup and change impact when the repository is buildable or language-server support is available. Syntactic Tree-sitter and AST graphs are easier to deploy locally and across many languages; they are robust and cheap, but miss deeper type and build semantics. Embedding retrieval is strongest for broad natural-language matching and fuzzy discovery, but weak on precise cross-file dependency reasoning unless paired with reranking or graph constraints.
The evidence quality is uneven. The best measured local-indexing evidence in this lane comes from Codebase-Memory's reported token and tool-call savings, alongside METR-style benchmark discipline. LSP-to-MCP tooling is moving quickly, but much of the evidence is still documentation, GitHub repositories or blog posts rather than peer-reviewed evaluation. For large repositories, the most resilient pattern appears to combine symbols, dependencies, commit history and embeddings rather than flattening code into text chunks.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| a1 | Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP | arXiv | 2026 | Persistent Tree-sitter knowledge graph exposed via MCP; parses 66 languages and reports 10x fewer tokens and 2.1x fewer tool calls than a file-exploration agent on 31 repos. |
| a2 | Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants | arXiv | 2026 | Deterministic, evidence-backed architectural map of buildable components, aggregators, runners, tests, external packages, and package managers with explicit dependency and coverage edges. |
| a3 | On the Challenges and Opportunities of Learned Sparse Retrieval for Code | arXiv | 2026 | Introduces SPLADE-Code and argues that learned sparse retrieval can be competitive for code; reports sub-millisecond retrieval on 1M passages with little effectiveness loss. |
| a4 | SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction | arXiv | 2025 | Combines dual static-dynamic knowledge graphs, neural graph-query generation, SMT-guided beam search, and incremental KG maintenance. |
| a5 | GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion | arXiv | 2025 | Builds a multi-level code graph unifying files, ASTs, call graphs, class hierarchies, and data-flow graphs; hybrid retriever plus graph attention reranker. |
| a6 | RepoScope: Leveraging Call Chain-Aware Multi-View Context for Repository-Level Code Generation | arXiv | 2025 | Static-analysis-only repository structural semantic graph with call-chain prediction and structure-preserving serialization. |
| a7 | RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval | arXiv | 2025 | Repository knowledge graph augmented with node text and embeddings; uses Cypher for entity queries and MCTS-guided graph exploration for natural-language queries. |
| a8 | Knowledge Graph Based Repository-Level Code Generation | arXiv | 2025 | Repository graph representation to improve code search and retrieval for repo-level generation; evaluated on EvoCodeBench. |
| a9 | Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification | arXiv | 2025 | Defines RepoAlign-Bench for change-request-driven repo retrieval and proposes a dual-tower retriever with adversarial reflection. |
| a10 | Repository-level Code Search with Neural Retrieval Methods | arXiv | 2025 | Multi-stage retrieval/reranking for repository-level code search using commit histories plus BM25 and CodeBERT reranking. |
| a11 | RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph | arXiv | 2024 | Plug-in repository-level code graph that boosts SWE-bench and CrossCodeEval performance across multiple methods. |
| a12 | GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model | arXiv | 2024 | Code Context Graph with control/data/control-dependence edges and coarse-to-fine graph retrieval. |
| a13 | How and Why LLMs Use Deprecated APIs in Code | arXiv | 2024 | Empirical study showing LLMs rely on code search services and can be influenced by retrieval behavior when using deprecated APIs. |
| a14 | Improving Text Embeddings with Large Language Models | arXiv | 2024 | LLM-assisted embedding training that improves BEIR/MTEB performance; relevant to embedding-based retrieval quality. |
| a15 | Retrieval Augmented Code Generation and Summarization | arXiv | 2021 | Early retrieval-augmented code generation/summarization framework (REDCODER). |
| a16 | SCIP Code Intelligence Protocol / Sourcegraph SCIP | Sourcegraph documentation / GitHub | 2024 | Language-agnostic code indexing protocol for go-to-definition, references, and implementations. |
| a17 | Serena | Open-source MCP toolkit / GitHub | 2025 | MCP-based coding agent toolkit exposing semantic retrieval and symbol-level editing via LSP integration. |
| a18 | multilspy | GitHub | 2024 | Python LSP client library intended for applications around language servers. |
| a19 | MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers | arXiv | 2025 | Proxy layer for MCP servers that can simplify access patterns and decouple clients from servers. |
| a20 | CodeSift | Practitioner tool/site | 2025 | MCP tools for code intelligence claiming reduced-token workflows for agents. |
| a21 | GitHub MCP Server | GitHub repository | 2025 | Official MCP server supporting repository and workflow intelligence across MCP hosts. |
| a22 | HCAST: Human-Calibrated Autonomy Software Tasks | METR / PDF | 2025 | Autonomy benchmark suite for software, ML engineering, cybersecurity, and research tasks. |
| a23 | METR preliminary evaluations of Claude 3.7, GPT-4.5, o3/o4-mini, and related frontier-model reports | METR evaluation reports | 2025 | Comparative agent evaluations on HCAST, SWAA, and RE-Bench, with time-horizon estimates and observations on reward hacking / cheating behaviors. |
| a24 | METR Time-Horizon and Frontier-Risk updates | METR blog / analysis | 2025 | Time-horizon analyses across software and research tasks; updates on frontier model behavior in task suites. |
| a25 | Context Engineering for AI Agents in Open-Source Software | arXiv | 2025 | Empirical study of AGENTS.md / AI config files across 466 OSS projects; shows no standard structure yet and strong variation in provided context. |