HyperAIHyperAI

Command Palette

Search for a command to run...

Selected for NeurIPS 2025, NVIDIA Proposed the ERDM Model to Solve long-term Forecasting Challenges, and Its Mid- to long-term Forecasts Continue to Lead the EDM benchmark.

Featured Image

Medium-term weather forecasting (≤15 days) is a major, long-standing challenge for the scientific community. As a typical chaotic system, the atmosphere is extremely sensitive to initial conditions, and small errors can be rapidly amplified, causing forecasts to deviate from reality. While ensemble numerical weather forecasting, which uses multiple perturbation simulations to estimate uncertainty, has become a mainstream approach,However, its computing resources grow exponentially with the improvement of accuracy and timeliness requirements.This bottleneck is driving academia to turn to new data-driven approaches to seek breakthroughs.

In recent years, breakthroughs in generative modeling have provided new solutions to this problem. Among them, the rolling sequence diffusion model (RSDM) is a typical representative of the diffusion model.A progressive noise scheduling mechanism is used to impose stronger noise on the long-term forecast state.This approach simulates the gradual accumulation of uncertainty over time in the real world, effectively improving the authenticity of predictions. However, the current RSDM is still built on the framework of the earlier Denoised Diffusion Probabilistic Model (DDPM), and the limitations of its underlying architecture have, to a certain extent, restricted further improvements in the model's overall performance.

It is worth noting thatNVIDIA's Elucidated Diffusion Model (EDM), presented in the best paper of NeurIPS 2022,By unifying and improving the classic DDPM, training stability and generation quality have been significantly improved. If key optimization strategies such as the time loss weighting mechanism in EDM can be effectively integrated into RSDM, its modeling accuracy and operational efficiency are expected to be significantly improved.

Based on this, the research team of NVIDIA and the University of California, San Diego, based on the EDM framework, systematically improved the noise scheduling, denoising network parameterization, preprocessing process, loss weighting strategy and sampling algorithm to meet the needs of sequence modeling, and constructed an enhanced elucidated rolling diffusion model (ERDM).This research focuses on overcoming the collaborative design problem of "progressive noise scheduling" and "time loss weighting".It provides a new and efficient path for the probabilistic prediction of chaotic dynamical systems.

The relevant research results are titled "Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting".It has been selected for NeurIPS 2025, a top academic conference in the field of artificial intelligence.

Paper address:
https://doi.org/10.48550/arXiv.2506.20024
Follow the official account and reply "ERDM" to get the full PDF

More AI frontier papers:
https://hyper.ai/papers

Dataset: Navier-Stokes and ERA5 meteorological data

To support model training and validation, this study selected two types of benchmark datasets with clear application backgrounds, corresponding to fluid dynamics modeling and medium-term weather forecasting tasks respectively.

In fluid dynamics experiments,The researchers used the Navier-Stokes fluid dynamics benchmark dataset.The dataset is constructed on a 221×42 grid structure. Circular obstacles are randomly placed in each simulation case, which alter the fluid's path. To maintain consistent experimental conditions, the fluid viscosity in all simulations is fixed at 1×10⁻³. The dataset records the core physical field information of fluid motion, including the x- and y-direction velocity fields, as well as the pressure field. During model training and testing, boundary conditions and obstacle masks serve as auxiliary inputs to help the model accurately capture boundary effects and obstacle influences. The goal of the testing phase is to predict the fluid evolution process for the next 64 time steps based on a single initial state.

In the medium-range weather forecast benchmark test,The researchers used the ERA5 reanalysis dataset toIts spatial resolution is 1.5°, corresponding to a grid size of 240×212. The dataset contains a total of 69 forecast variables, covering two types of key meteorological elements: upper air and surface. Upper air variables involve temperature (t), geopotential height (z), specific humidity (q), and u and v components of wind fields of 13 pressure layers (unit: hPa); surface variables include 2m temperature (2t), mean sea level pressure (mslp) and u and v components of wind fields at 10m height (10u and 10v). The model training phase uses hourly ERA5 data from 1979 to 2020 to cover long-term climate characteristics; the evaluation phase selects 64 different initial meteorological conditions at 00:00 and 12:00 (UTC) in 2021 to test the medium-term forecast performance of the model under different initial conditions.

ERDM model: Integrating innovation and core architecture design to provide a new path for modeling chaotic dynamical systems

The core contribution of ERDM is to combine the idea of "noise gradually increases with the forecast length" in the Rolling Sequence Diffusion Model (RSDM) with the proven normalized design of EDM.It provides a new path that combines theoretical rigor and practical robustness for modeling chaotic dynamic systems such as fluid motion trajectories and weather forecast sequences.

ERDM first improves the noise scheduling mechanism.Unlike traditional linear or cosine scheduling, it employs a rolling noise schedule tailored to the characteristics of sequence generation. This approach divides the generation window into multiple consecutive time periods, assigning different noise intensities to each period and ensuring smooth transitions between adjacent periods. During training, as shown in the figure below, the model randomly samples different noise levels to learn to adapt to various noise scenarios. During generation, the noise is gradually attenuated from a high initial level, ultimately outputting clear results. ERDM also adjusts key curvature parameters to better suit sequence generation tasks, thereby preserving more effective information during the denoising process.

Comparison of noise scheduling with sequence length W


On this basis,ERDM introduces the probabilistic flow ordinary differential equation (ODE),This equation is used to precisely control the addition and removal of noise. It describes the complete trajectory of data evolution from a noisy state to a clear result and can be considered a "navigation map" for the generation process. As shown in the figure below, during inference, the model iteratively solves the ODE using numerical methods: the data at the first moment is completely denoised and output as the prediction result, while some noise is retained at the remaining moments. A rolling mechanism is then activated, using this partially denoised data as the pre-order state for the next generation, supplemented with a new noise moment, and the ODE solution process is repeated, thus achieving continuous generation of long sequences.


ERDM sampling with window

The training of ERDM revolves around the denoiser network.This network draws on the standardized preprocessing method of EDM and can adaptively process data according to the noise level at each moment, with the goal of recovering the original information from the noisy sequence. In terms of training strategy, ERDM adopts an "uncertainty-aware" weighting method, which not only retains conventional weighting to stabilize training, but also gives higher weights to intermediate noise samples with richer information, guiding the model to focus on learning intermediate states that are critical to the generation process. In specific training, the model starts from a clean sequence, adds random intensity noise, and attempts to restore the original data, and optimizes the parameters by comparing the difference between the prediction and the true value. Experiments have shown that introducing time-related noise design can further improve the stability of long-term predictions.

In order to better capture the dynamic characteristics of time series data, ERDM has made further optimizations on the denoiser structure. It abandons the 2D convolution that destroys temporal correlation and the computationally expensive 3D convolution, and instead adopts a hybrid architecture of 2D U-Net + temporal attention. The 2D U-Net backbone is responsible for extracting the spatial features of each moment, the temporal attention layer captures the dependencies between moments, and the noise information is embedded in the regularization layer to regulate the network behavior. This design strikes a balance between efficiency and performance. Although it is slightly more complex than the pure 2D structure, it significantly improves the quality of sequence prediction. In addition,The study also found that pre-training time series data is more effective than adjusting it afterwards.It can form a good synergy with the overall framework.

Schematic diagram of a 2D U-Net

Experimental evaluation: Performance is comparable to the most advanced weather forecast system, with higher computational efficiency

To verify the effectiveness of ERDM in modeling chaotic dynamic systems, researchers conducted a systematic evaluation focusing on the two major goals of probabilistic prediction accuracy and uncertainty quantification reliability.The experiment uses two types of core indicators:The continuous graded probability score (CRPS) is used to comprehensively evaluate the overall deviation between the forecast value and the actual observation. The lower the value, the better the performance. The spread skill ratio (SSR) judges the rationality of the uncertainty estimate by comparing the ensemble variance with the ensemble mean error. An SSR lower than 1 indicates that the uncertainty is underestimated, while an SSR higher than 1 indicates overestimation. Ideally, it should be close to 1.

In the Navier-Stokes fluid modeling experiment, the researchers used benchmark models including DYffusion and a set of benchmark models built based on EDM. The experimental results show thatERDM shows significant advantages in the late prediction stage, with its CRPS improving by approximately 50% compared to the best EDM baseline.While EDM models initially performed slightly better, their errors grew more rapidly over time. DYffusion failed to outperform the EDM benchmark throughout the forecast. From an uncertainty calibration perspective, ERDM consistently outperformed the EDM benchmark with significant underdispersion, but the study found that the latter struggled to improve calibration while maintaining CRPS performance.

Comparison of the forecast performance of ERDM with the baseline model over 64 time steps in the Navier-Stokes test

In the more challenging ERA5 medium-term weather forecast task, the institute's benchmarks include internal EDM benchmarks and external business models such as IFS ENS, NeuralGCM ENS and Graph-EFM. In terms of computational efficiency,ERDM only requires 4 H100 GPUs for 5 days of training, which is much lower than other data-driven methods.Experimental results show that ERDM is consistently better than the EDM benchmark in terms of CRPS indicators, with the maximum improvement reaching 10%. It also outperforms Graph-EFM. Compared with IFS ENS and NeuralGCM, ERDM is competitive, but it is still slightly inferior to IFS ENS in the short-term prediction of some variables. Analysis shows that this is related to the way the initial field is constructed, and it can be further improved in the future through the IFS ENS initialization strategy. It is worth noting that ERDM and IFS ENS jointly show the best performance in probability calibration, while other data-driven models generally have short-term underdiffusion problems. In terms of physical consistency,The 14-day forecast power spectrum generated by ERDM is highly consistent with the IFS ENS.It demonstrates physical realism that is superior to most machine learning models, while NeuralGCM exhibits obvious energy underestimation in the mid- and high-frequency bands.


Performance evaluation of ERDM

A new era of chaos prediction: building a new bridge between certainty and randomness

In the field of chaotic dynamical system modeling and sequence prediction, which ERDM focuses on, the global academic and business communities are continuously promoting innovative breakthroughs in this direction through interdisciplinary integration and technology implementation. These explorations not only continue the core logic of "combining physical priors with data-driven", but also expand the application boundaries of probabilistic prediction and uncertainty quantification.

On the one hand, academic breakthroughs are concentrated on deep innovations in fluid dynamics and diffusion model architecture. Google DeepMind has collaborated with teams from New York University, Stanford University, and other institutions to developCombining Physical Information Neural Network (PINN) with high-precision Gauss-Newton optimizer,For the first time, new unstable singularities were systematically discovered in three types of fluid equations, including the Navier-Stokes equations, providing a new paradigm for exploring the complex landscape of nonlinear partial differential equations.
Paper Title:Discovery of Unstable Singularities
Paper address:https://go.hyper.ai/iGh6t

At the level of diffusion model architecture optimization,The Diffusion Force (DF) framework proposed by the MIT CSAIL team combines the advantages of the full-sequence diffusion model with autoregressive prediction.By assigning an independent noise level to each token and adopting a causal architecture, the stability and flexibility of long sequence generation are improved. The derived Monte Carlo Tree Guidance (MCTG) strategy can significantly improve the sampling efficiency of high-reward trajectories and has been proven effective in fields such as robot planning and video prediction.

Paper Title:Diffusion Forcing:Next-token Prediction Meets Full-Sequence Diffusion
Paper address:https://arxiv.org/pdf/2407.01392

on the other hand,The innovative practices in the business community are more focused on the scenario-based implementation of technology and the improvement of its efficiency.It has demonstrated significant value in weather forecasting and multi-domain series prediction. Huawei, in collaboration with the Chongqing Meteorological Bureau, released the "Tian Zi 12h" AI weather forecast model (V2.0). Based on the nested architecture of the Pangu large-scale model, it integrates minute-by-minute radar mosaics and high-precision terrain data. Through spatiotemporal weight optimization, it increases forecast resolution to 1 km/hour. During the Chongqing rainstorm, it accurately depicted rainband patterns and precipitation intensity. In the field of general series prediction, Amazon's DeepAR model, using an LSTM architecture and a joint training strategy, achieves probabilistic prediction of multivariate time series. The generated probability distribution effectively quantifies uncertainty and has been deployed in scenarios such as retail inventory management and energy consumption forecasting. It improves forecast accuracy by capturing correlations between time series.

In the future, with the evolution of basic models and the integration of cross-disciplinary knowledge, ERDM and similar technical paths are gradually building a bridge connecting deterministic equations with the uncertainty in actual work. They will not only serve traditional scientific computing tasks such as weather forecasting and fluid simulation, but will also provide a new generation of probabilistic modeling foundation for complex sequential decision-making problems such as robot planning, energy scheduling, and even biodynamics.

Reference Links:
1.https://mp.weixin.qq.com/s/v7uuViL8gF0-5dNEGBR_aw
2.https://mp.weixin.qq.com/s/e5WVUW-HtoOPj4Kef9JwGA
3.https://mp.weixin.qq.com/s/58ZxgFiXqT4efdfygm_t9g

Selected for NeurIPS 2025, NVIDIA Proposed the ERDM Model to Solve long-term Forecasting Challenges, and Its Mid- to long-term Forecasts Continue to Lead the EDM benchmark. | News | HyperAI