HyperAIHyperAI

Command Palette

Search for a command to run...

3年前

注意機構とコンテキストマッチャー機構を用いたConvolutional-LSTMによる数値天気予報

Selim F. Tekin Arda Fazla Suleyman S. Kozat

LSTMネットワークを用いた天気予報

RTX 5090のコンピュートリソースがわずか20時間分 $1 (価値 $7)
ノートブックへ移動

概要

高解像度物理モデルを用いた数値天気予報は、スーパーコンピュータ上で膨大な計算資源を必要とし、その結果、実際の応用場面での広範な利用が制限されてきた。この課題に対処するため、深層学習手法の適用がこの分野において革新的な解決策をもたらしていることが示されている。本稿では、高解像度の時空間天気データを予測するための新たな深層学習アーキテクチャを提案する。我々のアプローチは、従来のエンコーダ・デコーダ構造に、Convolutional Long-short Term Memory(ConvLSTM)およびConvolutional Neural Networks(CNN)を統合することで拡張されている。さらに、我々はモデルアーキテクチャに注意機構(attention)およびコンテキストマッチャー(context matcher)のメカニズムを組み込んでいる。我々のWeather Modelは、ConvLSTM、TrajGRU、U-Netといったベースラインの深層学習モデルと比較して、顕著な性能向上を実現した。実験評価には、ERA5時間別気圧レベルデータセットおよびWeatherBenchという、大規模な実世界のベンチマーク数値天気データセットを用いた。我々の結果は、注意行列が入力系列の異なる部分に焦点を当てることで大気循環をモデル化し、時空間相関の特定において大幅な改善をもたらすことを示している。さらに、我々はベンチマーク指標を用いて高解像度物理モデルと我々のモデルを比較し、我々のWeather Modelが高精度であり、かつ解釈容易であることを示した。

One-sentence Summary

The authors propose a Weather Model that integrates Convolutional Long Short-Term Memory networks, convolutional neural networks, attention mechanisms, and a context matcher to forecast high-resolution spatio-temporal weather data, outperforming ConvLSTM, TrajGRU, and U-Net baselines on the ERA5 hourly pressure-level and WeatherBench datasets by leveraging attention matrices to model atmospheric circulations while matching the accuracy of high-resolution physical models.

Key Contributions

  • This work introduces a novel deep learning architecture for high-resolution spatiotemporal weather forecasting that extends a conventional encoder-decoder framework by integrating Convolutional Long Short-Term Memory networks and Convolutional Neural Networks.
  • The model incorporates attention and context matcher mechanisms to explicitly capture atmospheric circulations, enabling the network to focus on distinct input regions for identifying complex spatial and temporal correlations.
  • Evaluations on the ERA5 hourly pressure level dataset and WeatherBench demonstrate substantial performance improvements over baseline models including ConvLSTM, TrajGRU, and U-Net, while achieving accuracy comparable to high-resolution physical models with enhanced interpretability.

Introduction

High-resolution numerical weather forecasting traditionally relies on computationally intensive physical models that require supercomputing resources, which severely restricts their real-world deployment for time-sensitive applications. While deep learning has emerged as a faster alternative, conventional architectures often struggle to accurately capture the complex spatial and temporal dependencies inherent in atmospheric data. The authors leverage a novel encoder-decoder framework that integrates Convolutional Long Short-Term Memory networks with Convolutional Neural Networks to overcome these bottlenecks. By embedding attention and context matcher mechanisms, their model dynamically isolates critical input regions to better simulate atmospheric circulations. Evaluated on large-scale benchmarks like ERA5 and WeatherBench, this architecture delivers superior predictive accuracy over standard baselines while maintaining the interpretability and computational efficiency required for practical forecasting workflows.

Dataset

  • Dataset Composition and Sources: The authors utilize two primary meteorological datasets: ERA5 hourly pressure level data and the WeatherBench benchmark.
  • Subset Details and Spatial Filtering: The ERA5 subset is spatially cropped to cover the Mediterranean and Black Sea coasts of Turkey, specifically 30° to 45° latitude and 20° to 50° longitude. It spans 2000 to 2001, uses a 30 km spatial resolution, a 3-hour temporal resolution, and a 61 by 121 grid at the 100 hPa pressure level. The WeatherBench subset operates on a 32 by 64 grid with hourly temporal resolution at the 850 hPa level.
  • Feature Selection and Processing: For both datasets, the authors designate temperature as the endogenous target variable while treating all remaining meteorological features as exogenous inputs. This configuration supports next-step temperature prediction for ERA5 and direct or iterative forecasting for WeatherBench across a 3 to 5 day lead time.
  • Data Splits and Model Usage: The ERA5 dataset is partitioned into training, validation, and test sets using an 80 percent, 10 percent, and 10 percent ratio. For WeatherBench, the authors train and validate on data from 2015 through the end of 2016, reserving the 2017 to 2018 period for testing. The authors apply these curated splits and explicit feature mappings directly to their forecasting pipelines without additional metadata construction.

Method

The authors leverage a novel deep learning architecture for numerical weather prediction (NWP), referred to as the Weather Model (WM), which integrates multiple spatio-temporal data sources through an attention mechanism and employs Convolutional Long Short-Term Memory (ConvLSTM) units as core building blocks. The overall framework follows an encoder-decoder structure designed to capture both spatial and temporal correlations in weather data, enabling accurate long-term predictions. The model architecture is illustrated in Figure 1.

As shown in the figure below, the model processes a sequence of spatio-temporal input data, where each input Xt\mathcal{X}_tXt is a tensor of weather features at time step ttt. The primary components of the model are the encoder, the decoder, an attention mechanism, and a context matcher. The encoder, composed of stacked ConvLSTM units, processes the input sequence to generate hidden states that encapsulate the relevant spatio-temporal information. The attention mechanism, applied at each time step, selectively weights the input features based on the previous encoder hidden state, allowing the model to focus on the most relevant cells and features. The attention mechanism computes energy matrices for each input feature using convolutional operations, which are then transformed into attention weights via a softmax function. These weights are used to create weighted input series that are fed into the encoder.

The decoder, also built from stacked ConvLSTM units, generates the output sequence recursively. To address long-term dependencies and extend the gradient flow, a context matcher mechanism is introduced. This mechanism aggregates the hidden states from all time steps of each encoder layer, summing them across the temporal dimension to produce a context vector for each layer. The context matcher reverses the layer order of these summed states before passing them to the decoder. The decoder then uses these context states and the previous output to generate predictions for the subsequent time steps. The final output is produced by applying convolutional layers to the decoder's hidden state, resulting in the predicted weather values. This recursive generation process allows the model to forecast sequences of arbitrary length, providing flexibility in the prediction horizon.

Experiment

The evaluation setup employs latitude-weighted metrics to benchmark the proposed Weather Model against established deep learning and physical baselines across high-resolution and standard datasets using sequential, iterative, and direct forecasting strategies. Experimental results validate that convolution-based architectures are essential for tracking spatial weather patterns, while the iterative forecasting approach successfully replaces the exponential error growth observed in benchmark methods with a more stable logarithmic trajectory. Although the model demonstrates robust short-term predictive capabilities and effectively leverages attention mechanisms to prioritize dynamic input features, physical models retain superior accuracy for extended forecasts due to their comprehensive atmospheric simulations. Ultimately, the study confirms that the novel attention-enhanced architecture significantly advances spatio-temporal forecasting efficiency, particularly within shorter prediction windows where spatial continuity remains intact.

The authors evaluate their Weather Model against various baseline models using multiple prediction methods and datasets, comparing performance across different metrics. Results show that the Weather Model outperforms several deep learning models in certain setups, particularly in iterative prediction, while physical models achieve superior results in long-term forecasts. The model's effectiveness is influenced by the presence of spatial movements in the data, and its error growth is logarithmic compared to exponential growth in some baselines. The Weather Model achieves better performance than several deep learning baselines in iterative prediction methods. Physical models outperform all deep learning models in long-term forecasts, indicating limitations in data-driven approaches for extended predictions. The Weather Model's error growth is logarithmic, contrasting with the exponential error growth observed in some baseline approaches.

The authors evaluate their Weather Model against several baseline models on a high-resolution dataset using latitude-weighted RMSE, MAE, and MAPE metrics. Results show that the Weather Model outperforms all baselines, with the best performance across all metrics. The model demonstrates significant improvements over existing deep learning approaches, particularly in terms of error growth over longer forecast periods. The Weather Model achieves the best performance across all evaluation metrics compared to baseline models. The Weather Model outperforms ConvLSTM, TrajGRU, SMA, and LSTM in both RMSE and MAE. The Weather Model shows superior performance to other deep learning models on the high-resolution dataset, with statistically significant gains.

The Weather Model is evaluated against multiple deep learning and physical baselines across various datasets and prediction methods to assess its forecasting accuracy and error dynamics. Results indicate that the model consistently surpasses deep learning approaches in iterative and high-resolution scenarios, largely due to its logarithmic error growth compared to the exponential degradation observed in competitors. Physical models, however, maintain a clear advantage in long-term forecasts, highlighting the current limitations of purely data-driven architectures for extended time horizons. Overall, the experiments demonstrate that while the proposed framework delivers significant qualitative improvements in short-range predictions and spatial modeling, hybrid or physics-informed strategies remain essential for sustained forecasting stability.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています