AI in Weather and Climate Prediction

AI in weather and climate prediction across the 2015 to June 2026 machine-learning era, with historical context from mid-twentieth-century numerical weather prediction and Lorenz's chaos theory: the shift from physics-based NWP and statistical post-processing (MOS) to data-driven models (GraphCast, GenCast, Pangu-Weather, FourCastNet, Aurora, NeuralGCM, ECMWF AIFS), how forecasters at ECMWF, NOAA, and the Met Office have operationalised them, measured accuracy versus the IFS, and the predictability limits imposed by chaos, the Lorenz attractor, and the butterfly effect.

Claude Opus 4.8
financial
frontier
academic
vc
blogs

Synthesised 2026-06-26

Narrative

The academic literature on machine learning for weather prediction spans roughly three phases. The foundational period, culminating in Rasp et al.'s WeatherBench (JAMES, 2020) and its ERA5-based benchmark, established reanalysis data as the common training substrate and pushed the field to define reproducible metrics. ERA5's 40-year, 0.25-degree hourly archive, described by Hersbach et al. (2020) in the Quarterly Journal of the Royal Meteorological Society, proved decisive: without it, data-hungry deep models would have had no adequate training set. Rasp and Thuerey (JAMES, 2021) then showed that a ResNet pretrained on climate simulations could match simple operational baselines, signalling that the deep-learning approach was credible but not yet competitive.

The second phase, 2022 to 2023, saw rapid architectural diversification and first decisive outperformance of ECMWF's HRES. Pathak et al.'s FourCastNet (arXiv:2202.11214, 2022) applied Adaptive Fourier Neural Operators to the global atmosphere at 0.25-degree resolution, trading some accuracy for speed. Bi et al.'s Pangu-Weather (Nature, 2023) used 3D Earth-Specific Transformers and a multi-timescale model combination to outperform HRES on several variables. Lam et al.'s GraphCast (Science, 2023) employed a multi-scale graph neural network trained on 221 ERA5 variables and was the first model to definitively beat HRES across the board on standard medium-range metrics. Rasp et al.'s WeatherBench 2 (arXiv:2308.15560, later JAMES 2024) upgraded the evaluation framework to 0.25-degree ground truth and live scoreboards, providing the community an independent yardstick.

The frontier phase, from late 2023 to mid-2026, has extended ML forecasting into probabilistic, hybrid, and foundation-model territory. Price et al.'s GenCast (arXiv:2312.15796, Nature 2024) introduced a diffusion-based ensemble model outperforming ECMWF's ENS on 15-day probabilistic metrics. Kochkov et al.'s NeuralGCM (Nature, 2024) fused a differentiable dynamical core with learned ML physics, achieving climate-scale stability while matching GraphCast on medium-range skill. Bodnar et al.'s Aurora (arXiv:2405.13063, Nature 2025) trained a 3D Perceiver-Swin foundation model on over one million hours of heterogeneous Earth-system data, outperforming operational forecasts on air quality, ocean waves, tropical cyclone tracks, and high-resolution weather. ECMWF operationalised its own GNN-based system, AIFS (arXiv:2406.01465, Lang et al. 2024), going live in February 2025, confirming that data-driven forecasting had crossed from research demo to production.

Critical limits are now being mapped in detail. Selz and Craig (Geophysical Research Letters, 2023) showed that current deterministic ML models fail to reproduce the butterfly effect: infinitesimal initial perturbations do not grow at the correct rate, implying the models currently sidestep rather than solve Lorenz's predictability problem. Zhang et al. (arXiv:2504.20238, 2025) tested whether ML models can push deterministic predictability beyond Lorenz's two-week estimate, finding modest extensions consistent with improved effective initial conditions. Empirically, Zhang, Fischer et al. (arXiv:2508.15724, 2025) demonstrated that for record-breaking heat, cold, and wind extremes, ECMWF HRES still consistently outperforms GraphCast, Pangu-Weather, and FuXi, with AI models systematically underpredicting intensity and frequency of tail events. Studies on robustness under climate change (arXiv:2409.18529) found that AI models trained on present-day ERA5 produce skillful forecasts in pre-industrial and +2.9 K future climates but show cold biases drifting back towards training distribution in warmer scenarios, a direct expression of out-of-distribution fragility that the NeuralGCM hybrid architecture partially mitigates by design.

Sources

ID	Title	Outlet	Date	Significance
a1	Learning skillful medium-range global weather forecasting	Science	2023-12	GraphCast (Lam et al., DeepMind) outperforms ECMWF HRES on 90% of 1,380 test metrics using a multi-scale GNN trained on 221 ERA5 variables, marking the first definitive ML victory over operational NWP.
a2	Accurate medium-range global weather forecasting with 3D neural networks	Nature	2023-07	Pangu-Weather (Bi et al., Huawei) introduces 3D Earth-Specific Transformers and a multi-timescale inference strategy that surpasses HRES on several variables, published in Nature with open weights.
a3	GenCast: Diffusion-based ensemble forecasting for medium-range weather	arXiv / Nature (2024)	2023-12	Price et al. introduce a graph-based diffusion ensemble model that outperforms ECMWF ENS on 15-day probabilistic metrics, establishing generative ML as a credible route to ensemble forecasting.
a4	Neural general circulation models for weather and climate	Nature	2024-07	Kochkov et al. (Google) fuse a differentiable dynamical core with learned ML physics, producing a hybrid model that matches GraphCast on medium-range skill and reproduces realistic climate statistics over decades.
a5	A Foundation Model for the Earth System	arXiv / Nature (2025)	2024-05	Bodnar et al. (Microsoft) train Aurora on over one million hours of diverse geophysical data; it outperforms operational forecasts for air quality, ocean waves, tropical cyclone tracks, and high-resolution weather at much lower compute.
a6	AIFS -- ECMWF's data-driven forecasting system	arXiv	2024-06	Lang et al. describe ECMWF's own GNN-transformer system, which went operational in February 2025, marking the first major centre to operationalise a purely data-driven global weather forecast.
a7	WeatherBench 2: A benchmark for the next generation of data-driven global weather models	arXiv / JAMES (2024)	2023-08	Rasp et al. provide the community's standard independent evaluation framework for ML weather models, with ERA5 at 0.25-degree resolution as ground truth and a continuously updated live leaderboard.
a8	WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting	Journal of Advances in Modeling Earth Systems	2020-11	Foundational benchmark paper by Rasp et al. that established ERA5-based evaluation of data-driven global weather models and catalysed rapid progress by providing common baselines.
a9	FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators	arXiv	2022-02	Pathak et al. (NVIDIA/Berkeley Lab) introduce Adaptive Fourier Neural Operators applied to 0.25-degree global weather prediction, the first model to achieve HRES-competitive skill at that resolution.
a10	Can Artificial Intelligence-Based Weather Prediction Models Simulate the Butterfly Effect?	Geophysical Research Letters	2023-10	Selz and Craig demonstrate that current deterministic ML models fail to reproduce rapid upscale error growth from small perturbations, meaning they sidestep rather than respect Lorenz's intrinsic predictability limit.
a11	Atmospheric Predictability Beyond 30 Days with Machine Learning	arXiv	2025-04	Tests whether ML models can push deterministic predictability beyond Lorenz's two-week estimate, revisiting the theoretical limit in the context of modern data-driven approaches.
a12	Numerical models outperform AI weather forecasts of record-breaking extremes	arXiv	2025-08	Zhang, Fischer et al. show empirically that ECMWF HRES consistently outperforms GraphCast, Pangu-Weather, and FuXi on record-breaking heat, cold, and wind events, quantifying the tail-event failure mode of current ML models.
a13	Robustness of AI-based weather forecasts in a changing climate	arXiv	2024-09	Tests AIFS, GraphCast, and Pangu-Weather on pre-industrial, present-day, and +2.9 K future climates, finding skillful short-range forecasts but systematic cold biases in warmer states that expose out-of-distribution fragility.
a14	ClimaX: A foundation model for weather and climate	arXiv / ICML (2023)	2023-01	Nguyen et al. present the first weather-climate foundation model using transformer pretraining on CMIP6 datasets, demonstrating generalisation to tasks unseen during pretraining including climate projections.
a15	Evaluation of five global AI models for predicting weather in Eastern Asia and Western Pacific	npj Climate and Atmospheric Science	2024-09	Independent regional evaluation of Pangu-Weather, FourCastNet v2, GraphCast, FuXi, and FengWu against ERA5 for typhoon track and intensity, providing geographically focused accuracy assessment beyond global averages.
a16	Comparing and Contrasting Deep Learning Weather Prediction Backbones on Navier-Stokes and Atmospheric Dynamics	arXiv	2024-07	Systematic controlled comparison of GNN, transformer, and Fourier-operator architectures on both synthetic Navier-Stokes and real ERA5 data, isolating the effect of architecture from training choices.
a17	An update to ECMWF's machine-learned weather forecast model AIFS	arXiv	2025-09	Describes AIFS 1.1.0, the operational version deployed August 2025, documenting incremental skill improvements, new variables, and the correction of a precipitation forecast issue in the initial release.
a18	Neural general circulation models for modeling precipitation	Science Advances	2026-01	Yuval, Kochkov et al. extend the NeuralGCM framework by training directly on satellite-based precipitation observations, demonstrating improved simulation of extremes and the diurnal cycle over existing GCMs.
a19	Data-driven ensemble forecasting with the AIFS	ECMWF Newsletter	2024-10	Describes ECMWF's two parallel approaches to ML ensemble forecasting (diffusion-based and CRPS-trained AIFS), both using the same encoder-GNN-transformer architecture as the deterministic system.
a20	FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale	arXiv	2025-07	Latest generation of the NVIDIA FourCastNet series applies geometric deep learning to probabilistic global forecasting at scale, extending the Fourier-operator lineage into ensemble territory.
a21	WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models (JAMES published version)	Journal of Advances in Modeling Earth Systems	2024-06	Published final version of the WeatherBench 2 framework, providing metrics, baselines, and methodology used by all major ML weather model evaluations as of 2024-2026.
a22	Butterfly Effects and Finite Predictability in AI-Based Weather Prediction	ESS Open Archive	2025-07	Revisits Selz and Craig's butterfly-effect findings using a taxonomy of three types of butterfly effect, sharpening the understanding of which predictability limits current AI models do and do not respect.
a23	A Practical Probabilistic Benchmark for AI Weather Models	arXiv	2024-01	Brenowitz et al. propose CRPS-based probabilistic evaluation specifically designed for ML weather models, filling a gap in WeatherBench 2 which focused on deterministic metrics.
a24	SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth	arXiv	2025-10	Proposes geographically stratified evaluation of AI weather models to surface regional performance differences masked by global-average RMSE, exposing heterogeneous skill across latitudes and terrain.
a25	Do machine learning climate models work in changing climate dynamics?	arXiv	2025-09	Systematic OOD evaluation of state-of-the-art ML climate models under distribution-shifted scenarios, documenting where models fail to generalise and informing credible near-term directions for climate-scale ML.