HyperAI

Colorado State University Releases CSU-MLP Model to Predict Medium-term Severe Weather Using Random Forest Algorithm

特色图像

Weather forecasts, especially severe weather forecasts, have a significant impact on people's daily work and life. According to the Sigma research report "Natural catastrophes in times of economic accumulation and climate change", the global losses caused by severe weather have continued to increase in recent years. In 2019 alone,The global economic losses caused by related disaster events amounted to US$146 billion, and the insurance losses amounted to US$60 billion.The report also states that as severe weather disasters become increasingly destructive, related losses will increase further in the future. Therefore, it is particularly urgent to accurately predict severe weather.

Recently, Aaron J. Hill and Russ S. Schumacher from Colorado State University and Israel Jirak from the Storm Prediction Center (SPC) of the National Oceanic and Atmospheric Administration (NOAA) jointly developed a machine learning model based on random forests, CSU-MLP.The model is able to accurately forecast severe weather in the medium term (4-8 days).The results have been published in the journal Weather and Forecasting.

The results have been published in Weather and Forecasting.

Paper address:

https://arxiv.org/abs/2208.02383

 CSU-MLP Overview

Severe weather forecasts in the United States are generally made by the SPC mentioned above using the numerical weather prediction (NWP) model, which can warn of specific severe weather and the location of its occurrence 1-2 days in advance.However, 3-8 days in advance, we can only warn the location where the weather will occur, but cannot predict what kind of severe weather it will be.

In the past decade, a high-resolution numerical weather prediction model CAMs (convection-allowing models) has emerged, which has made forecasts within a time range of less than 4 days (short-term) more accurate, but the forecast effect has not improved much for medium and long-term time ranges.Machine learning is gradually being applied in the field of meteorology.

In this CSU-MLP (Colorado State University Machine Learning Probabilities) study, the meteorological data for model training came from the Global Ensemble Forecast System version 12 (GEFSv12) reforecast dataset (hereafter referred to as GEFS/R), which contains 20 years of detailed historical weather data for the continental United States.The researchers selected nine years of data (2003-2012) as the training set for this medium-term forecast study.Two years (2020-2022) were selected as the test set.

 Random Forest Algorithm 

This study is based on a machine learning algorithm called Random Forest (RF).The so-called random forest is a classification and regression algorithm based on ensemble learning.Specifically in this study, the severe weather characteristics are input and the entire decision tree is traversed to obtain the severe weather prediction results.

Therefore, the feature input of bad weather is particularly important in the random forest algorithm.The researchers extracted 12 feature variables related to severe weather from the training set mentioned above for training.The specific characteristic variables are shown in the following table.

12 feature variables for model training and prediction

However, in the GEFS/R dataset, the resolution of these characteristic variables is not consistent, so the researchers performed interpolation processing.It was unified to 0.5 degree grid spacing (dergee grid spacing).

 Feature Engineering 

In addition to using random forests for medium-term severe weather forecast analysis, this study also briefly explored feature engineering. Feature engineering refers to a data processing technique used to collect features from around observed events and convert them into a form that can be used by machine learning algorithms. Specifically for this experiment, the researchers mainly proposed two methods to simplify features:Including spatial averaging the features and time-lagging.

Spatial averaging refers to the researchers taking the average value of all characteristic variables at each prediction space point.soThe interference of noisy data can be reduced to improve model performance.The specific process is shown in the figure below.

Feature variable combination processing method

The time-lag method refers to the process of forecasting or modeling.The delayed application of observations from a period of time in the past to predictions or modeling at the current point in time.

It is based on the assumption that past observational data can provide useful information about the current state and future trends.In this experiment, the researchers used a time-lag method to expand the size of the GEFS/R dataset.But this process does not generate any additional computational effort for the model.

 Test results

The researchers tested the CSU-MLP prediction results using GEFSv12's 1.5-year real-time weather forecast and compared them with the manual forecasts generated by SPC.In the medium-term forecast range, the accuracy and forecast area of the forecast system based on random forest are better than those of SPC.As shown in the figure below. However, as the time range increases, the forecasting ability of both will decrease.

Comparison of CSU-MLP and SPC medium-term forecasts on March 27, 2022

Figure a is the 4-day forecast of CSU-MLP, and Figure b is the 4-day forecast of SPC.The shaded area represents the predicted probability of severe weather.The circular icons refer to the SPC's local forecasts for tornadoes (red), hail (green), and storms (blue), and the lower left and right corners of the image are the forecast skill score BSS for evaluating the accuracy of weather forecasts and the observation coverage for evaluating the representativeness of local weather forecasts, respectively.

In this regard, the researchers concluded that the skill and accuracy of the entire prediction system have been greatly improved.This is mainly because the prediction system based on random forests has strong prediction capabilities in both continuous probability and low probability contours (the contours formed by areas with low probability in the estimation of severe weather)..

In addition, the researchers also tested the impact of different regions and different factors (thermodynamics and kinetics) on forecasts.The characteristic variables are explored to be important for severe weather forecasting.The result is shown in the figure below.

Different characteristic variables are important for weather forecasting

Although the specific impact of the above factors and regions on the forecast needs to be further studied, the researchers have made a preliminary judgment: these different characteristic variables will be further learned by the model and used for severe weather forecasting. This also shows thatThe prediction system based on random forest has been further trained and improved, and has certain credibility and practicality.

Of course, during this experiment, the researchers also pointed out that there are still many parts that need to be improved in the prediction system based on random forests. For example,CSU-MLP also needs to add the forecast data of SPC manual forecast.Further improve the credibility of machine learning forecast results.

 A new stage of AI intelligent meteorology may be coming

Humans have always been committed to understanding and predicting the world, and one of the more successful examples is weather forecasting. In ancient times, people mostly made forecasts based on their life experience, such as "Don't go out if there is morning glow, but travel thousands of miles if there is evening glow." In modern times,Scientists are beginning to use sensors and weather satellites to collect vast amounts of data to make more accurate forecasts.

It is worth noting that at the current stage of meteorological development, the addition of AI has greatly enhanced the accuracy of weather forecasts. According to foreign media reports,In recent years, Swiss meteorological researchers have successfully predicted the time and location of lightning by introducing AI.The model currently has a prediction accuracy of 80%.

At the same time, as early as 2015, IBM spent $2 billion to acquire the digital and data assets of Weather Co., the parent company of Weather Channel. The reason why the company spent so much money was that it planned to combine Weather Co.'s weather data and forecast information with its AI service Watson.Giants such as IBM are already very optimistic about the potential of AI in meteorology and have begun to make plans.

It is not difficult to predict. Although there are thousands of objective factors that affect weather changes, it is still difficult to accurately forecast the weather.But as the integration of AI and meteorology deepens, a new era of intelligent meteorology defined by AI may be coming faster.

PS:

The code and dataset of this paper will be released on HyperAI's official website Hyper.ai. Interested partners can continue to pay attention~