Google's Flood Prediction Model Is Published in Nature Again, Beating the World's No.1 System and Covering 80+ Countries

It is recorded in "The Book of Documents·The Canon of Yao": "The flood is raging, sweeping across the mountains and hills, and overwhelming the sky. The people are asking for advice." In the era of Yao and Shun, floods made the people miserable. Yao and Shun decided to find someone to control the flood. Gun was initially appointed but failed. Later, Dayu inherited his father's career and controlled the flood again. Hence the legend of "Dayu controlled the flood for thirteen years and passed by his home three times without entering."
In July 2023, a rare torrential rainstorm caused by Typhoon Dusurui hit Beijing, and the Daqing River Basin saw a record-breaking peak flow. According to People's Daily Online, this flood disaster affected more than 1.29 million people in Beijing, more than 59,000 houses collapsed, more than 147,000 houses were seriously damaged, and the affected area of crops reached more than 225,000 mu.

From ancient times to the present, humans are often in a weak position when facing natural disasters such as floods. Grey Nearing, a Google research scientist, once stated in his paper that an effective flood forecasting system can reduce the number of related deaths by 43% and reduce economic losses by 35%-50%. It can be seen that establishing a flood forecasting system is an important means for humans to deal with flood disasters.
The current global flood forecasting system mostly relies on observation stations set up along rivers. Due to the limited deployment costs, the number of flow meters installed in low-income and middle-income countries is often low, making it difficult for these countries to prepare response measures in advance when floods occur.According to the World Bank, if the flood forecasting system in developing countries is upgraded to the level of developed countries, it will save about 23,000 lives each year. It is urgent to establish a flood forecasting system for basins without flood monitoring stations.
Fortunately, with the development of science and technology, the application of artificial intelligence (AI) in the field of floods has brought hope for flood defense in basins without monitoring stations.Grey Nearing and his team from Google Research developed a river forecast model based on machine learning.The model can achieve reliable flood predictions five days in advance. When predicting flood events that occur once every five years, its performance is better than or equivalent to the current prediction of flood events that occur once a year. The system can cover more than 80 countries.
Research highlights:
* The river forecast model has better prediction capabilities than GloFAS, the world's most advanced flood forecasting system
* Provide better support for flood warnings in ungauged basins

Paper address:
https://www.nature.com/articles/s41586-024-07145-1
Dataset download address:
https://hyper.ai/datasets/30647
Follow the official account and reply "Flood Forecasting System" to get the complete PDF
Dataset: from 5,680 watersheds
The study's full dataset includes model inputs and (runoff) target values from 5,680 watersheds, based on which the researchers trained and tested the model.

This study uses three types of public data as input, mainly from government sources:
* Static watershed data representing geographic and geophysical variables:From the HydroATLAS project, including long-term climate indicators (precipitation, temperature, snow cover), land cover, and anthropogenic attributes.
* Historical meteorological time series data:From NASA IMERG, NOAA CPC Global Unified Gauge-Based Analysis of Daily Precipitation and ECMWF ERA5-land reanalysis. Variables include daily total precipitation, temperature, thermal radiation, snowfall and surface pressure.
* Time series data of weather forecast within seven-day forecast range:These data are from the ECMWF HRES atmospheric model, and the meteorological variables are the same as above.
Model architecture: Building a river forecast model based on LSTM

This study used two applications of long short-term memory networks (LSTM) to build a river forecasting model, the core of which is the encoder-decoder mechanism (encoder–decoder model).Hindcast LSTM receives historical weather data, Forecast LSTM receives forecast weather data, and the output of the model is the probability distribution parameter of each prediction time step, which represents the probability prediction of the volume flow of a specific river at a specific time.
In addition, the researchers trained the model on 50,000 minibatches, and all input data was standardized in advance. To enhance the learning ability of the model, the researchers set the hidden size of the encoder and decoder LSTM to 256 cell states, as well as the linear-cell-state transfer network and nonlinear hidden-state transfer network.
Model optimization: Cross-validation reduces prediction error
The researchers used cross-validation to train and test the river forecast model out-of-sample on 5,680 stream gauges to ensure that the generalization ability of the model was effectively evaluated and to improve the reliability of predictions.
First, in the time dimension, the cross-validation folds are designed so that the test data of any monitoring station within a year must not overlap with the training data used. In the spatial dimension, k-fold cross-validation (k = 10) is used to evenly divide the data in the spatial dimension. These two cross-validation processes are repeated to avoid data leakage between training and testing.
Secondly, to further examine the performance of the model under different geographical regions and environmental conditions, the researchers also conducted more types of cross-validation experiments, including but not limited to: non-random spatial segmentation according to continents (k = 6), different climate zones (k = 13), hydrologically separated watershed groups (k = 8), etc. * k-fold cross-validation: Divide the dataset into k subsets, of which 1 subset is used for validation and the remaining k-1 subsets are used for training. Repeat the cross-validation k times, with each subset validated once, and average the k results to obtain the final evaluation of the model.
Experimental conclusion: The performance is better than the most advanced flood forecasting system in the world
In order to evaluate the reliability of flood event predictions, the researchers compared the river forecast model with the world's most advanced flood forecasting system, GloFAS (Global Flood Awareness System).

Difference in F1 scores for predicting events with a 2-year return period
* Red indicates the difference is between -0.2 and 0
* Green indicates the difference is between 0-0.2
First, the researchers analyzed the distribution of differences in F1 scores between the river forecast model and the GloFAS model for predicting events with a 2-year return period under nowcasting from 1984 to 2021.
The results show thatThe river forecast model performs better than the GloFAS model at 70% monitoring stations (a total of 3,673).

Distribution of precision and recall for events with different return periods
* The blue dotted line is the reference line
* N is the number of monitoring stations
Second, the researchers analyzed the distribution of precision and recall for events with different return periods under instant prediction.
The results show that the river forecast model has higher reliability in predicting all return period events. Regarding the accuracy of predicting extreme events, the river forecast model has no significant difference in the 5-year return period and GloFAS in the 1-year return period, while the recall rate is higher than GloFAS.This shows that the accuracy of the river forecast model in predicting 5-year return period events is better than or equivalent to the accuracy of GloFAS in predicting 1-year return period events, that is, its reliability in predicting longer return period flood events is better than the current most advanced model in predicting 1-year return period flood events.* Return period: The number of years a flood peak occurs is the return period. The longer the return period, the greater the magnitude of the flood; the shorter the return period, the smaller the flood.

The blue dotted line is the reference line
Third, the researchers analyzed the distribution of F1 scores for events with different return periods when forecasting 0-7 days in advance.
The results show that for forecasting events with a return period of 1 year (a), 2 years (b), 5 years (c), and 10 years (d), the F1 scores of the river forecast models are either higher than the nowcast of GloFAS or have no significant difference up to 5 days in advance.This shows that the flood forecasting capability of the river forecast model is better than or equal to that of GloFAS within 5 days in advance.

Fourth, the researchers analyzed the distribution of F1 scores when predicting events with different geographic locations and return periods.
The results show thatThere are significant differences in the reliability of the two models across different geographic locations.In addition, in predicting events with a return period of 1 year (a), 2 years (b), 5 years (c), and 10 years (d), the F1 scores of the river forecast models at different geographical locations were either higher or had no significant difference compared with GloFAS.
From the European EFAS to the Chinese Xinanjiang model, AI has become an intelligent line of defense
In fact, as early as 2021, when Google demonstrated its AI technology research results at the "Inventors@Google" event, it mentioned the machine learning-based flood forecasting system Google Flood Hub. At that time, the system was mainly used in India, and it used visualization to let local people understand the flood situation. After three years of development, Google's latest flood forecasting system can be expanded to other basins without stations, covering more than 80 countries.
Similar to this is the European Flood Awareness System (EFAS), which uses advanced meteorological forecasts and hydrological models, combined with machine learning algorithms, to make reliable flood forecasts across Europe at least ten days in advance and send accurate early warnings to national and local flood centers in member states.
In addition, as one of the countries with frequent floods, about 2/3 of my country's land is at risk of floods to varying degrees. According to statistics, from 1991 to 2020, the average annual number of deaths or missing persons caused by floods in my country exceeded 2,000, the cumulative number of deaths exceeded 60,000, and the average annual direct economic loss was about 160.4 billion yuan.

In the face of flood hazards, my country's independently developed Xin'anjiang model, based on long-term practical accumulation and in-depth study of hydrological laws, divides the entire river basin into multiple unit sub-basins, and considers the impact of factors such as topography, soil, and vegetation on hydrological processes. It provides accurate hydrological prediction results and is widely used in flood prevention and disaster reduction.
In fact, humans have never stopped exploring more effective flood prevention measures. Although floods cannot be fundamentally eliminated, advanced flood forecasting systems can be used to predict disasters in advance and take measures to minimize the negative impact of floods on human society. Today, flood forecasting systems based on AI technology are no longer limited to a specific area, and may also cover the world in the future to protect more citizens from flood hazards.
References:
1.http://bj.people.com.cn/n2/2023/0809/c14540-40525241.html
2.https://www.sohu.com/a/766008856_473283
3. https://www.sohu.com/a/745381603_121687414
4.https://european-flood.emergency.copernicus.eu/en/european-flood-awareness-system-efas
5.https://developer.baidu.com/article/details/3096974
6.https://blog.research.google/2024/03/using-ai-to-expand-global-access-to.html
7.https://m.jiemian.com/article/6809946.html