Research · Frontier Lab & Model News

Back to sweep

Research sweep · deep · 2023 – 2026

AI Regulation and the Regulated Enterprise — Trajectory to 2030

The trajectory of AI regulation across the EU AI Act, the UK's pro-innovation and contextual approach, and the financial-services regulatory regime (FCA, PRA, Bank of England) from January 2023 to May 2026, including the FCA Mills Review, GPAI obligations, model-risk and accountability rules, and what they demand of technology leadership in regulated firms

  • Gemini 2.5 Pro
  • financial
  • frontier
  • academic
  • vc
  • blogs
  • tech

Synthesised 2026-05-19

Narrative

The period from early 2023 to mid-2026 saw rapid advancements and releases from frontier AI labs, alongside increasing efforts in external evaluation and safety. OpenAI continued to iterate on its GPT series, releasing GPT-4 in 2023 with enhanced capabilities and later introducing GPT-4o in August 2024, followed by a series of GPT-5.x models in 2025 and 2026, including GPT-5.4 as a capable frontier model for professional work and GPT-5.5 Instant for improved conversational AI. Anthropic also progressed its Claude family, launching Claude 3 models (Haiku, Sonnet, Opus) in March 2024 with multimodal input and expanded context windows, culminating in the release of Claude 4 (Opus 4, Sonnet 4) in May 2025, which brought significant improvements in coding, reasoning, and autonomous task execution.

Google DeepMind focused on its Gemini series, releasing Gemini Nano and Pro in December 2023 and Gemini Ultra 1.0 in February 2024. By late 2025 and early 2026, Gemini 3 emerged with advanced multimodal understanding and reasoning, with Gemini Deep Think demonstrating the ability to solve professional research problems in mathematics and science. Meta AI contributed to the open-source landscape with the LLaMA series, introducing LLaMA 3 in July 2024, which offered multilinguality, coding, reasoning, and tool usage, and later LLaMA 4 in April 2025, featuring a sparse Mixture-of-Experts architecture and extended context windows.

Mistral AI rapidly expanded its offerings, with Mistral Large 24.11 and Codestral 25.01 announced in late 2024 and early 2025, focusing on reasoning, coding, and long context. By December 2025, Mistral 3 was released, including a sparse mixture-of-experts model and smaller dense models, emphasising open-source and efficient AI. xAI, founded by Elon Musk in July 2023, launched its Grok series, with Grok 3 released in February 2025, trained on the Colossus supercomputer, and Grok 4.20 Reasoning and Non-Reasoning variants appearing in March 2026, notable for real-time data access and a distinctive tone.

External evaluations from METR (Model Evaluation & Threat Research) provided crucial independent assessments of these frontier models. METR published reports on GPT-4 and Claude in March 2023, and continued to evaluate subsequent releases such as GPT-4o, Claude 3.5 Sonnet, GPT-4.5, Claude 3.7, GPT-5, and GPT-5.1-Codex-Max through 2024 and 2025. In January 2026, METR released Time Horizon 1.1, an updated methodology for measuring AI autonomous capabilities, which indicated an accelerated rate of progress in AI capabilities since 2023. The Frontier Model Forum, established in July 2023 by major labs, also published research on AI safety frameworks and risk management.


Sources

ID Title Outlet Date Significance
t1 GPT-4 and Claude METR 2023-03 This early METR report provides a baseline evaluation of the autonomous capabilities of foundational models like GPT-4 and Claude, highlighting their performance in early 2023.
t2 LLaMA: Open and Efficient Foundation Language Models Meta Research 2023-02-24 This paper introduces Meta's LLaMA models, demonstrating that state-of-the-art performance can be achieved with publicly available datasets and releasing models to the research community.
t3 Introducing the Frontier Model Forum Frontier Model Forum 2023-07-26 This announcement marks the formation of the Frontier Model Forum by Anthropic, Google, Microsoft, and OpenAI, signalling a collaborative effort towards AI safety and responsible development.
t4 The Llama 3 Herd of Models AI at Meta 2024-07-23 This paper details the Llama 3 family of models, including a 405B parameter model with a 128K token context window, demonstrating Meta's advancements in multilinguality, coding, reasoning, and tool usage.
t5 GPT-4o METR 2024-08-07 METR's evaluation of GPT-4o provides an independent assessment of OpenAI's model capabilities, contributing to the understanding of its performance and risks.
t6 o1-preview METR 2024-09-12 This METR evaluation of OpenAI's o1-preview model offers insights into the performance of a specific model variant, particularly useful for tracking incremental advancements.
t7 Claude 3.5 Sonnet (original) METR 2024-10-30 METR's evaluation of Claude 3.5 Sonnet provides an external benchmark for Anthropic's model, detailing its capabilities and performance at the time of release.
t8 [Announcing Mistral AI's Mistral Large 24.11 and Codestral 25.01 models on Vertex AI Google Cloud Blog](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHo4kdmEvGLWyadZMiePJVfemSreDTzW1usOaNtoNK39QcahkwZnGeBNfoHHzS1Ng7YlDQQSz-ZaaMKE_VBBFZLqrxhIrxsKePrb9QoRInBdQHydt4t8cq5T5rqAZ5v6gbwd-fZdVQ0htTMJYtOcLs_zjLCjDFvuYLvbD-N_-fe0pxt9ETNmytLTUPZ9Hj8ROF7v-Qyl7X1IUdQzBpQpZtxR351_Bg=) Google Cloud Blog 2025-01-14
t9 Claude 3.5 Sonnet and o1 METR 2025-01-31 METR's evaluation of Claude 3.5 Sonnet and OpenAI's o1 provides a comparative assessment of these models, offering insights into their relative strengths and weaknesses.
t10 GPT-4.5 METR 2025-02-27 This METR report on GPT-4.5 provides an independent evaluation of OpenAI's model, detailing its performance and contributing to the understanding of its capabilities.
t11 Claude 3.7 METR 2025-04-04 METR's evaluation of Claude 3.7 offers an external assessment of Anthropic's model, providing data on its autonomous capabilities and potential risks.
t12 OpenAI o3 and o4-mini METR 2025-04-16 This METR report evaluates OpenAI's o3 and o4-mini models, providing insights into their performance and suitability for various tasks.
t13 [Anthropic Claude 4: Evolution of a Large Language Model IntuitionLabs](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF_SSqw3UbRLyoZr4amxmt5quDeJ678dS16cI_H6NJUVUaRznOIzZqDSaBw1ql_WHUQB-T8u__dYmQ_b7sCI7CEshKuataS1TqS6Z83qGFeNXu1qwtqs9DB2R76jthiVt42nr45ybcNSbyNLGnNUwhyELHwX-a_N9gxA7Zr0w==) IntuitionLabs 2025-05-22
t14 GPT-5 METR 2025-08-07 METR's evaluation of GPT-5 provides a critical, independent assessment of OpenAI's next-generation model, focusing on its autonomous capabilities and potential risks.
t15 GPT-5.1-Codex-Max METR 2025-11-19 This METR report specifically evaluates GPT-5.1-Codex-Max, offering insights into its coding capabilities and potential for catastrophic risks like AI self-improvement.
t16 Gemini 3 for Technical Documentation: Industry Disruption Predictions and Adoption Roadmap 2025-11-20 - Sparkco Sparkco 2025-11-20 This report highlights Gemini 3's advancements in multimodal alignment and context window, positioning it as a competitor to GPT-5 in image-text integration and technical documentation.
t17 Introducing Mistral 3 Mistral AI 2025-12-02 Mistral AI's announcement of Mistral 3, including a sparse mixture-of-experts model and smaller dense models, signifies their commitment to open-source, high-performing, and efficient AI.
t18 [The state of open source AI models in 2025 Red Hat Developer](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHavMTaq3zXFbb1Lp5wBe5OZbuNphB8orvax8zSUHx6BARnvwV63y9jGm7D2tAwR7chtCv2Zm9fcew-0w1vk5msr9KJOwND5PMtZJHYjrEWNbHaeUKnY9TqRNZyTy2DnXxIUVXrnWjAtOPIQqxo6cNARt-s0_YhLwgu37EQ89-v_FjONqmiZmH8_afhokk=) Red Hat Developer 2026-01-07
t19 What's in Grok? (Independent Grok 5 Paper) - LifeArchitect.ai LifeArchitect.ai 2026-01-21 This independent report provides a quantitative analysis of xAI's Grok models, detailing their rapid evolution from a 33B parameter prototype to frontier models with trillions of parameters, despite xAI's secrecy.
t20 Time Horizon 1.1 - METR METR 2026-01-29 METR's release of Time Horizon 1.1 updates their methodology for measuring AI autonomous capabilities, indicating an increased rate of progress in AI capabilities since 2023.
t21 Gemini Deep Think: Redefining the Future of Scientific Research - Google DeepMind Google DeepMind 2026-02-11 This announcement details Gemini Deep Think's ability to solve professional research problems in mathematics, physics, and computer science, demonstrating advanced reasoning capabilities.
t22 Anthropic's Transparency Hub Anthropic 2026-02-20 Anthropic's Transparency Hub provides detailed information on Claude models, including Opus 4.7, highlighting their multimodal capabilities, knowledge cut-off dates, and development resources.
t23 Evaluating AI Providers' Frontier AI Safety Frameworks - arXiv arXiv 2026-03-26 This arXiv paper assesses the frontier AI safety frameworks of twelve AI companies, revealing that many aspects are missing or under-specified, limiting their effectiveness as accountability mechanisms.
t24 Grok AI: The Complete Guide to Elon Musk's Chatbot (2026) LifeArchitect.ai 2026-03-29 This guide provides a comprehensive overview of xAI's Grok, detailing its unique characteristics like real-time access to X/Twitter data, irreverent tone, and advanced features such as DeepSearch and Big Brain Mode.
t25 [Latest news Mistral AI](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEsEvTHdoLBvogxq55YXnfzaiEUlZSRE4l4zry3FSXr6mW2j0mjKzXeApQaH-eWUBoFohEi0BXLhK3DMIngbmn1_kBIa6Q_QRvoeDVRjnlx9-XB8oOBgGUsUmttqUHb08cCFNbY) Mistral AI 2026-04-29

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.