Research · Blogs & Independent Thinkers
Back to sweepResearch sweep · deep · 2015 – 2026
AI in Weather and Climate Prediction
AI in weather and climate prediction across the 2015 to June 2026 machine-learning era, with historical context from mid-twentieth-century numerical weather prediction and Lorenz's chaos theory: the shift from physics-based NWP and statistical post-processing (MOS) to data-driven models (GraphCast, GenCast, Pangu-Weather, FourCastNet, Aurora, NeuralGCM, ECMWF AIFS), how forecasters at ECMWF, NOAA, and the Met Office have operationalised them, measured accuracy versus the IFS, and the predictability limits imposed by chaos, the Lorenz attractor, and the butterfly effect.
- Claude Opus 4.8
- financial
- frontier
- academic
- vc
- blogs
Synthesised 2026-06-26
Narrative
The dominant Substack voice for technically fluent independent commentary on AI weather prediction is Karolina Stanisławska's AI Weather Hub, which bridges the gap between ML engineers and practising meteorologists. Her January 2025 piece on GenCast explains the model's conditional diffusion architecture with unusual clarity, showing how the ensemble emerges from random noise seeding rather than from perturbed initial conditions as in classical NWP, and situating the butterfly effect squarely within that design choice. Phil Siarri's Philaverse Substack (July 2025) and the CLAI Ventures Substack (January 2025) offer more investor-oriented but technically specific surveys of the operational landscape, noting that ECMWF, the UK Met Office, NOAA, and several Asian meteorological agencies are at different stages of evaluation or deployment. The Open-Meteo Substack by Patrick Zippenfenig provides practitioner commentary on integrating GraphCast into live APIs, including the mismatch between ERA5 training data and GFS initialisation that produces systematic inconsistencies.
Two structural tensions dominate the independent commentary. First, every major ML weather model currently depends on classical NWP for its input state: data assimilation, which combines observations into a coherent atmospheric analysis, remains a physics-based step upstream of even the most capable data-driven forecast. Several independent writers note this clearly, and it is confirmed by ECMWF's own AIFS blog series, which tracks accuracy against the IFS and openly catalogues blurriness in surface fields and spurious negative precipitation values that were not corrected until the August 2025 AIFS v1.1.0 release. The ECMWF AIFS accuracy-versus-activity blog post from December 2024 provides one of the few honest operational scorecards, showing that forecast activity does not drift with lead time for AIFS, unlike some third-party models, but that stratospheric skill remains limited. Second, commentators diverge on whether ML models have altered the predictability horizon or merely approached it more cheaply. Research reviewed in multiple independent pieces suggests AI models may actually attenuate small-perturbation growth, meaning they understate the butterfly effect rather than overcoming it; the FuXi ensemble study in npj Climate and Atmospheric Science (2025) is the most direct evidence for this.
The physical-consistency problem receives sustained independent attention. Massimo Bonavita's 2024 Geophysical Research Letters paper, widely cited in practitioner blogs, documents that ML forecast energy spectra differ from those of reanalysis and NWP models, producing overly smooth forecasts. The 2025 Journal of Advances in Modeling Earth Systems paper by Sha and colleagues demonstrates that adding global mass and energy conservation schemes to FuXi reduces forecast error and corrects systematic biases such as excess light rain. ECMWF's own September 2025 AIFS update paper formalises this finding for AIFS, introducing output bounding layers. These are not peripheral technical details: they are the crux of whether ML models can be trusted for extreme events, where physical plausibility matters most and where the training distribution is thinnest.
The outlook framing in independent commentary is more cautious than the vendor announcements. The articsledge.com analysis (May 2026) notes that AI models remain downstream of classical assimilation and that climate-change-driven distribution shift poses a structural challenge: models trained on 1979 to 2017 ERA5 data are being asked to forecast an atmosphere that is systematically warmer in ways that have no close historical analogue. The GeoAI Unpacked Substack notes that many national weather services run legacy Fortran code, raising a practical question of whether ML models will complement or displace existing infrastructure. The Science Advances paper from April 2026 on physics-based models outperforming AI for record-breaking extremes is the sharpest independent counter-evidence to the headline accuracy claims, and it sits uneasily alongside the vendor benchmarks.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | [AI Weather Hub | Karolina Stanisławska | Substack](https://aiweatherhub.substack.com/) | AI Weather Hub (Substack) |
| b2 | GenCast – AI meets ensemble forecasting | AI Weather Hub (Substack) | 2025-01 | Explains GenCast's conditional diffusion ensemble architecture and its relationship to the butterfly effect, cross-referencing the Nature paper with accessible independent analysis. |
| b3 | Why AI Weather? – AI Weather Hub | AI Weather Hub (Substack) | 2024-10 | Sets out the independent case for why ML weather forecasting is a distinct paradigm shift from statistical post-processing, contextualising it against the ChatGPT moment in AI. |
| b4 | AI and the future of weather forecasting – Phil Siarri | Philaverse (Substack) | 2025-07 | Named-author survey of operational deployment status at ECMWF, Met Office, NOAA, and Asian agencies, with model-by-model architecture summaries and citation of primary literature. |
| b5 | A Breakthrough Year for AI in Weather Forecasting: Insights and Opportunities | CLAI Ventures (Substack) | 2025-01 | Investor-practitioner analysis covering NeuralGCM and GenCast with specific benchmark figures, and noting the resolution gap between GenCast at 0.25° and ENS at 0.1°. |
| b6 | Exploring GraphCast – Open-Meteo | Open-Meteo (Substack) | 2024-04 | Practitioner API developer commentary on the ERA5-training versus GFS-initialisation mismatch in production GraphCast, and on NOAA's retraining effort to resolve it. |
| b7 | AI for weather forecasting – GeoAI Unpacked #2 | GeoAI Unpacked (Substack) | 2024-10 | Independent analysis highlighting the blurriness problem in AIFS versus IFS outputs, legacy Fortran infrastructure at national weather services, and the tension between accuracy metrics and visual realism. |
| b8 | So, which weather forecast is the best? – ActuallyWeather | ActuallyWeather (Substack) | 2025-12 | Independent real-world verification of ECMWF, GFS, HRRR, and GraphCast using location-specific skill scores, providing evidence outside vendor-reported benchmarks. |
| b9 | AI Weather Forecasting 2026: Models, Accuracy & Results | ArticlEdge | 2026-05 | Comprehensive independent synthesis citing primary benchmarks and noting the distribution-shift risk as AI models trained on 1979–2017 ERA5 are deployed into a systematically warmer 2026 atmosphere. |
| b10 | AIFS: a new ECMWF forecasting system | ECMWF Newsletter | 2024-01 | Primary institutional source documenting ECMWF's architectural choice of GNNs for AIFS, its ERA5 training regime, and its explicit comparison to GraphCast and Pangu-Weather. |
| b11 | Accuracy versus activity – ECMWF AIFS Blog | ECMWF | 2024-12 | Operational scorecard from ECMWF comparing AIFS, GraphCast, Pangu-Weather, and Aurora on RMSE and forecast activity metrics, including Aurora's instability beyond day 7. |
| b12 | Anemoi: a new framework for weather forecasting based on machine learning | ECMWF | 2024-10 | Documents ECMWF's open-source Anemoi framework enabling national meteorological services to build regional ML models on the same architecture as AIFS. |
| b13 | GraphCast: AI model for faster and more accurate global weather forecasting | Google DeepMind Blog | 2023-11 | Primary vendor announcement for GraphCast containing the headline claim of 90% superiority over HRES across 1,380 targets, and the Hurricane Lee nine-day landfall prediction case study. |
| b14 | GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy | Google DeepMind Blog | 2024-12 | Primary vendor source for GenCast's diffusion-based ensemble design and the 97.2% superiority over ECMWF ENS claim, providing the vendor-side data that independent commentary cross-references. |
| b15 | Probabilistic weather forecasting with machine learning (GenCast) | Nature | 2024-12 | Peer-reviewed primary source for GenCast, confirming that prior ML ensemble methods failed on blurring and that GenCast is the first to outperform ENS at 0.25° resolution. |
| b16 | On Some Limitations of Current Machine Learning Weather Prediction Models | Geophysical Research Letters | 2024-06 | Massimo Bonavita's widely cited analysis documenting that ML forecast energy spectra differ from NWP and reanalysis, producing blurriness that standard RMSE metrics do not penalise. |
| b17 | Improving AI Weather Prediction Models Using Global Mass and Energy Conservation Schemes | Journal of Advances in Modeling Earth Systems | 2025-11 | Demonstrates that adding conservation-law constraints to FuXi reduces forecast error and corrects the drizzle bias, directly addressing the physical-consistency critique. |
| b18 | An update to ECMWF's machine-learned weather forecast model AIFS | arXiv / ECMWF | 2025-09 | Documents the August 2025 AIFS v1.1.0 update incorporating output bounding layers to prevent physically implausible outputs such as negative precipitation. |
| b19 | Evaluation of five global AI models for predicting weather in Eastern Asia and Western Pacific | npj Climate and Atmospheric Science | 2024-09 | Independent homogeneous comparison of five ML models under identical ERA5 initial conditions, finding FengWu leads for typhoon track prediction and that a multi-model ensemble rivals the best single model. |
| b20 | A fast physics-based perturbation generator of machine learning weather model for efficient ensemble forecasts of tropical cyclone track | npj Climate and Atmospheric Science | 2025-03 | Finds that FuXi attenuates small-perturbation growth compared to IFS, suggesting AI weather models may understate the butterfly effect rather than overcome it. |
| b21 | An Observations-focused assessment of Global AI Weather Prediction Models During the South Asian Monsoon | arXiv | 2025-09 | Evaluates seven AI models against 458 weather stations and satellite data, finding AIFS leads on most metrics but all models show substantially higher errors against ground observations than against reanalysis. |
| b22 | Validating Deep Learning Weather Forecast Models on Recent High-Impact Extreme Events | Artificial Intelligence for the Earth Systems (AMS) | 2025-01 | Case studies on the 2021 Pacific Northwest heatwave and 2021 winter storm find that ML models match HRES locally but underperform aggregated, and lack variables needed for humid heatwave health-risk assessment. |
| b23 | Physics-based models outperform AI weather forecasts of record-breaking extremes | Science Advances | 2026-04 | The most direct counter-evidence to headline ML accuracy claims, showing physics-based NWP retains an advantage for truly unprecedented extreme events outside training distribution. |
| b24 | AI-Driven Weather Forecasts to Accelerate Climate Change Attribution of Heatwaves | Earth's Future (AGU) | 2025-08 | Demonstrates a new application of AI weather models for near-real-time attribution of heatwaves to anthropogenic climate change, showing NeuralGCM's hybrid physics advantage for SST-dependent events. |
| b25 | Weather forecasting in a changing climate: the rise of AI and Machine learning? | ScienceDirect (journal article) | 2026-05 | Most recent practitioner review, documenting AIFS superiority for tropical cyclone track prediction and the open Anemoi framework, with a frank assessment of remaining resolution and coupling gaps. |