Research · Frontier Lab & Model News

Back to sweep

Research sweep · deep · 2025 – present

AI 2027 Milestone Tracker

AI 2027 report milestone tracking (January 2025–present): which predicted capabilities have shipped across Anthropic, OpenAI, Google DeepMind, Meta, xAI, and major enterprise adopters; what remains unshipped or contradicted; and what near-term signals suggest for agentic AI, safety frameworks, autonomy, and deployment timelines

  • financial
  • frontier
  • academic
  • vc
  • substack

Synthesised 2026-04-08

Narrative

The period from January 2025 through April 2026 has produced a rich and often contradictory body of evidence against the AI 2027 report's milestone predictions. On the 'shipped' side, the agentic AI layer that AI 2027 predicted for mid-2025 has clearly arrived: OpenAI launched Operator in January 2025, integrated it into ChatGPT as a unified agentic system ('ChatGPT Agent') by July 2025, and released the enterprise-grade 'Frontier' platform in February 2026; Anthropic's Claude Code became a widely adopted agentic coding tool; Google DeepMind shipped Gemini 3 (November 2025) with multi-agent orchestration and a reimagined Deep Research agent; and Anthropic's Model Context Protocol crossed 97 million installs by March 2026, becoming foundational infrastructure. Flagship model releases — GPT-5 (August 2025), GPT-5.1/5.2 (November–December 2025), Claude Opus 4.5 (November 2025), Gemini 3 (November 2025), Grok 3 (February 2025) — broadly match AI 2027's qualitative prediction that '2025 AIs function more like employees.' The Stanford HAI 2025 AI Index confirms that SWE-bench scores reached 71.7% by end-2024 (up from 4.4% in 2023), and METR's time-horizon metric showed the frontier doubling approximately every 7 months. OpenAI's annualized revenue hit ~$20B, slightly ahead of AI 2027's $18B prediction. On the 'unshipped or contradicted' side, the authors themselves revised their median superhuman-coder timeline from 2027–2028 to approximately 2032 by December 2025 (a 3–5 year slip), citing modelling errors in AI R&D automation assumptions. Their own grading report found overall quantitative progress at only ~65% of predicted pace: SWEBench scores reached 74.5% rather than the predicted 85% by mid-2025, no leading AI company conducted a substantially larger training run than GPT-4.5, and AI R&D software uplift remains well below the 1.9x ratio predicted for end-2026. Critically, METR's randomized controlled trial found that early-2025 AI tools actually made experienced open-source developers 19% slower, not faster — a sharp contradiction of AI 2027's productivity uplift assumptions. The safety framework story is equally complex: Anthropic activated ASL-3 safeguards (an AI 2027-adjacent milestone) in May 2025, but then dropped its hard pause commitment entirely in February 2026 under competitive and political pressure, replacing rigid guardrails with nonbinding 'public goals.' The Pentagon simultaneously threatened Anthropic with blacklisting over safety red lines, illustrating that geopolitical friction — not just technical progress — is shaping deployment trajectories in ways AI 2027 did not adequately model.


Sources

ID Title Outlet Date Significance
t1 AI 2027 — Official Scenario Website AI Futures Project 2025-04 The primary source document forecasting AGI by 2027, including predictions about agentic AI capabilities, autonomous coding agents, and superintelligence timelines that serve as the baseline for milestone tracking.
t2 AI Futures Model: Dec 2025 Update — Revised Timelines AI Futures Project (blog.aifutures.org) 2025-12 The original AI 2027 authors revise their median superhuman-coder timeline from 2027–2028 to 2032, a 3–5 year slip, representing the most significant self-correction by the report's authors and directly validating the 'Fant-AI-sia' claim about uncertain timeline extrapolation.
t3 Grading AI 2027's 2025 Predictions AI Futures Project (blog.aifutures.org) 2026-02 Systematic grading of AI 2027's quantitative and qualitative 2025 predictions against actuals, finding overall progress at ~65% of predicted pace and specific shortfalls in SWEBench and AI R&D uplift metrics.
t4 AI 2027 Timelines Forecast — Supplement AI Futures Project 2025-05 Detailed methodology for predicting superhuman coders via METR time-horizon extrapolation; subsequent December 2025 edits acknowledge the superexponentiality argument was mistaken, directly weakening the core extrapolation.
t5 FutureSearch's Forecast on AI 2027 Timelines FutureSearch 2025-01 Independent forecasting critique of AI 2027, noting real-world R&D automation bottlenecks (weeks-long experiments) and predicting the milestone timeline would arrive 'much later,' which the AI 2027 team's December 2025 update confirmed.
t6 AI Expert Predictions for 2027: A Logical Progression to Crisis Center for AI Policy (CAIP) 2025-04 Policy-focused analysis of AI 2027 that affirms the agentic progression scenario as plausible and calls for U.S. national security audits of advanced AI systems, situating the report in regulatory discourse.
t7 Moving Back the AGI Timeline: AI 2027 Authors Revise to 2030 Marketing AI Institute 2025-12 Documents co-author Daniel Kokotajlo's public admission that his personal AGI timeline has shifted to around 2030, corroborating the 'Fant-AI-sia' critique that the original forecast extrapolated too aggressively.
t8 Anthropic's Responsible Scaling Policy Version 3.0 Anthropic (official) 2026-02 Anthropic's RSP v3.0 drops the hard commitment to pause training if safety measures are inadequate, replacing it with nonbinding public roadmaps — a major safety-policy inflection point at a frontier lab.
t9 Anthropic's Frontier Safety Roadmap Anthropic (official) 2026-02 Official Frontier Safety Roadmap introduced under RSP 3.0, detailing alignment assessment pipelines, sabotage risk reports for Claude Opus 4.5/4.6, and the difficulty of confidently ruling out AI R&D-4 capability thresholds.
t10 Exclusive: Anthropic Drops Flagship Safety Pledge TIME 2026-02 Reveals Anthropic's admission that its original safety commitment became untenable amid competitive pressure, political headwinds (Trump administration's deregulatory stance), and the fuzziness of capability thresholds — directly relevant to alignment intervention risk.
t11 Anthropic ditches its core safety promise amid Pentagon fight — CNN Business CNN Business 2026-02 Reports Pentagon ultimatum to Anthropic to roll back AI safeguards or lose a $200M contract, illustrating how geopolitical and procurement pressures override voluntary safety frameworks.
t12 Anthropic RSP 3.0 Explained: What's New in AI Safety Policy AdwaitX 2026-02 Detailed technical breakdown of RSP v3.0, including ASL-3 provisional activation for Claude Opus 4 in May 2025 over CBRN risks, and the structural limits of unilateral safety commitments without multilateral coordination.
t13 Introducing Operator — OpenAI's Browser-Using Agent OpenAI (official) 2025-01 Official launch of OpenAI's first agentic product — a computer-using agent for web task automation — directly instantiating the AI 2027 prediction of coding and agentic AI emerging in 2025.
t14 Introducing ChatGPT Agent: Bridging Research and Action OpenAI (official) 2025-07 Operator's successor product integrating browser navigation, deep research, and conversational AI into a unified agentic system, showing the rapid productization of autonomous AI agents at OpenAI.
t15 OpenAI Launches Frontier: Enterprise AI Agent Platform TechCrunch 2026-02 OpenAI's launch of an enterprise agent management platform treating AI agents as employees, marking the transition from research preview to enterprise infrastructure — validating AI 2027's agentic adoption trajectory.
t16 OpenAI Frontier: AI Agent Platform Could Reshape Enterprise Software Fortune 2026-02 Covers market disruption signals as Anthropic and OpenAI simultaneously launch enterprise agent platforms, alarming SaaS incumbents like Salesforce and Workday — supporting AI 2027's economic displacement narrative.
t17 OpenAI for Developers in 2025 — Year in Review OpenAI (official) 2025-12 Official summary of 2025 developer platform releases including Responses API, Agents SDK, Codex, and AgentKit, documenting the full agentic infrastructure buildout aligned with AI 2027 predictions.
t18 Measuring AI Ability to Complete Long Tasks — METR METR (Model Evaluation & Threat Research) 2025-03 Foundational empirical paper introducing the time-horizon metric showing exponential doubling (~7 months) in AI task autonomy from 2019–2025 — the primary benchmark underpinning AI 2027's capability extrapolations.
t19 METR Time Horizon 1.1 — Updated Autonomy Estimates METR 2026-01 Updated time-horizon evaluations covering GPT-5.2, Gemini 3 Pro, and Claude Opus 4.5, showing continued exponential growth in AI task autonomy but highlighting sensitivity of trend to task composition.
t20 METR Evaluation of OpenAI GPT-5 — Autonomy Report METR 2025-08 Empirical finding that GPT-5 achieved a 50%-time-horizon of 2h17m, within trend but short of AI 2027's implied milestones, and early evidence of models detecting they are being evaluated — a nascent alignment concern.
t21 METR Research Update: Algorithmic vs. Holistic Evaluation METR 2025-08 Key finding that AI agents performing well on auto-scored benchmarks still fail frequently on holistic production-quality tasks, directly supporting the 'Fant-AI-sia' claim that benchmark performance overstates real-world reliability.
t22 METR Developer Productivity RCT: AI Makes Experienced Developers 19% Slower METR 2025-05 Randomized controlled trial finding that early-2025 AI tools caused experienced open-source developers to take 19% longer on their tasks — directly contradicting the AI 2027 assumption of productivity uplift and supporting the 'Fant-AI-sia' enterprise inertia critique.
t23 When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation arXiv (preprint) 2026-02 Systematic study of 60 benchmarks showing that benchmark age and scale are strong predictors of saturation, with HumanEval, MMLU and others already saturated — empirical support for the 'Fant-AI-sia' S-curve plateau argument.
t24 Stanford HAI 2025 AI Index Report — Technical Performance Stanford HAI 2025-04 Authoritative annual report documenting benchmark saturation (Elo gap between top and 10th model narrowing from 11.9% to 5.4%), convergence of open/closed-weight models, and the cost-capability tradeoff of reasoning models.
t25 Google Launches Gemini Deep Research Agent — Same Day as GPT-5.2 TechCrunch 2025-12 Documents the simultaneous release of competing agentic research tools by Google DeepMind and OpenAI, illustrating the intensifying lab-vs-lab agentic race and the rapid obsolescence of benchmark comparisons.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.