Analyzing and Training Data From 2k+ Hydrological Stations Around the World, the Chinese Academy of Sciences Team Released ED-DLSTM to Achieve Flood Prediction in Areas Without Monitoring Data

As global climate change continues, flood disasters are becoming more frequent. A report jointly released by the United Nations Office for Disaster Risk Reduction and the Center for Research on Disaster Epidemiology at the University of Leuven in Belgium pointed out:Over the past 20 years, the number of flood disasters worldwide has more than tripled from 1,389 to 3,254, accounting for 401% of all disasters and affecting 1.65 million people.
Floods can cause huge casualties and property losses. In April this year, floods and geological disasters caused 1.598 million people in 17 provinces (regions and cities) including Jiangxi and Guangdong to suffer varying degrees of disaster, 24 people died or went missing, 140,300 hectares of crops were affected, and direct economic losses amounted to 11.98 billion yuan, the heaviest losses in the past 10 years.
How to effectively predict flood flow is crucial to reducing the risk of flood disasters. In the past few decades, flood flow prediction based on hydrological processes has made significant progress, but the prediction results of current methods still rely heavily on monitoring data and parameter calibration. In fact, there is no monitoring data for more than 95% of river basins in the world.How to solve the runoff and flood prediction in areas with no or insufficient monitoring data has always been a long-standing problem faced by the hydrology field.
In April 2024, Ouyang Chaojun's team from the Chengdu Institute of Mountain Hazards and Environment, Chinese Academy of Sciences, published a paper titled "Deep learning for cross-region streamflow and flood forecasting at a global scale" in The Innovation.An AI-based runoff flood prediction model ED-DLSTM is proposed. By encoding the static properties of the watershed and meteorological drivers, the model is trained using data from more than 2,000 hydrological stations around the world, attempting to solve the runoff prediction problem in watersheds with and without monitored data worldwide.
Research highlights:
- The ED-DLSTM model performs well in flood forecasting in both basins with and without monitoring data.
- For the first time, multiple hydrological AI models were trained and compared worldwide
- The encoding of spatial attributes significantly improves the predictive power of time series and explains the transferability well.

Paper address:
https://doi.org/10.1016/j.xinn.2024.100617
Dataset: Basin data with significant distribution differences
The training data set used in this study comes from 2,089 river basins in the United States (482 river basins), the United Kingdom (406 river basins), Central Europe (461 river basins), Canada (740 river basins), etc., as shown in the figure below:

Dataset download address:
- AMERICAN CAMELS:https://go.hyper.ai/nCkDT
- CAMELS-GB:https://go.hyper.ai/DdUEf
- Central Europe LamaH-CE:https://go.hyper.ai/rMHSO
- CAMELS-CL, CHILE:https://camels.cr2.cl/
- Canadian HYSETS:https://go.hyper.ai/l4etG
In general, the overall precipitation and soil moisture content in the eastern region are generally higher than those in the western United States and Canada; the western United Kingdom and the northern Scottish Highlands generally show higher annual average soil moisture and precipitation, while the variability of other variables is relatively low; in Central Europe, most of the river basins in the Austrian region are high in terrain, with high precipitation and low temperatures; the Rocky Mountains run through the United States and Canada, and the nearby basins are high in terrain, with high precipitation and soil moisture content and low temperatures. The complex evaporation and snowmelt effects make the coefficient of variation of runoff even greater.
In the researchers' view,The distribution differences of the above-mentioned regional watersheds are significant, and the spatial variability is large enough to ensure the diversity of data, which is sufficient to verify the cross-region streamflow forecasting (CSF) capability of ED-DLSTM.
Model architecture: novel cross-regional spatiotemporal integrated model ED-DLSTM
In this paper, researchers proposed a novel cross-regional spatiotemporal integration model ED-DLSTM.The model combines static spatial attributes and temporal forcing attributes.To achieve cross-regional traffic prediction, the following figure shows the overall architecture of the ED-DLSTM model:

The ED-DLSTM model uses an encoder-decoder structure.It includes two sub-models that operate in a symbiotic fashion, which is more suitable for capturing global and local basin relationships through joint modeling. As shown in the figure above, the input of the model is multimodal data, and the input spatial static grid attribute data forms a relatively sparse matrix.
in,The Encoder combines static attributes with forcing data.Static data include digital elevation models (DEMs), snow cover extent, soil moisture content, groundwater depth, potential evapotranspiration, drought index and river channel geometry. These attributes guide the model to distinguish the hydrological behavior of different regions. Forced data include precipitation, solar radiation, air temperature, dew point temperature, surface pressure, easterly and northerly wind speeds. These data have a temporal resolution of 24 hours.
Static information uses ordinary convolution to integrate channels, and residual convolution is used to extract spatial static properties. Then, spatial pyramid pooling (SPP) is used to map the matrix information of different regions into a fixed high-dimensional space, thereby spatially encoding specific regions. Subsequently, the encoded vector is used as the initial state layer of the LSTM unit.
The decoder is responsible for mapping high-level features to predicted traffic values using a reverse LSTM layer.The researchers chose to perform flow mapping in the last LSTM unit because the complete information of the Seq2Seq model should be decoded at the end, and this decoding layer can capture the information trend in reverse. The researchers can encode and decode various hydrological response behaviors of different basins separately.
Ultimately, the network learns the mapping relationship from dynamic time series to observed flow under regional static attributes, thereby providing consistent CSF capabilities, allowing the model to abstractly "recognize" the hydrological response characteristics of different basins.
Research results: ED-DLSTM model has excellent predictive and generalization capabilities
First, the researchers conducted a comparative assessment of the prediction credibility of the ED-DLSTM model from January 1, 2010 to January 1, 2012, and quantitatively evaluated it using the Nash-Sutcliffe efficiency (NSE).
- NSE (value range is (-∞, 1]) is used to evaluate the simulation results of hydrological models (the closer the NSE value is to 1, the more consistent the model simulation results are with the actual observations, and the NSE value less than 0 indicates that the model simulation results are poor)

As shown in the figure above:
- In the United States, 438 of the 482 watersheds analyzed had NSEs greater than 0, with an average NSE of 0.78 and a median NSE of 0.80.
- In the Canadian region, 695 of the 740 watersheds analyzed had NSEs greater than 0, with an average NSE of 0.80 and a median NSE of 0.82.
- In the UK, 391 of the 406 catchments analysed had NSEs above 0, with an average NSE of 0.68 and a median NSE of 0.70.
- In Central Europe, 433 of the 461 river basins studied had NSE above 0, with an average NSE of 0.73 and a median NSE of 0.79.
Overall,Those watersheds with larger rainfall or larger runoff coefficient usually produce better prediction results. It is worth noting that the average NSE of the watershed with 81.8% is higher than 0.6, which highlights the excellent prediction and generalization ability of the ED-DLSTM model.
Based on the pre-trained models of the above four regions (Northern Hemisphere), the researchers made predictions for 160 new and unfamiliar basins in Chile (Southern Hemisphere) (without any historical monitoring data training) to test the model's prediction ability in basins without monitoring data. The results are shown in the following figure:

When ED-DLSTM was deployed directly in new regions of Chile, the model pre-trained in the United States showed that NSE was greater than 0 in 76.9% of watersheds; the model pre-trained in Canada achieved NSE greater than 0 in 66.2% of watersheds; the model pre-trained in Central Europe achieved NSE greater than 0 in 53.1% of watersheds; and the model pre-trained in the United Kingdom performed the worst, with only 42.5% of watersheds having NSE greater than 0.
The prediction results of different pre-trained models showed strong consistency in spatial distribution, demonstrating the great potential of AI for water flow and flood prediction in unmetered river basins.
When the pre-trained model was used to predict 160 watersheds in Chile without monitoring data, the characteristics of each watershed were visualized (left side of the figure below) and similarity analyzed (right side of the figure below) using the ED-DLSTM encoder. It was found that the average encoding similarity between the pre-trained models was 38.4% higher than that of random noise, indicating that the embedding layer of ED-DLSTM is not a disordered random signal, but a high-dimensional feature information recognized and utilized by the model.It proves that AI can learn "hydrological knowledge" in different river basins.

AI + Hydrology, promoting the development of smart water conservancy
Flood prediction is one of the important branches of hydrology. Speaking of hydrology, my country has already had the measurement of rainfall and water level before the Qin Dynasty. During the Warring States Period, the Qin State's "Land Law" stipulated that local officials must report rainfall and the number of acres of land that benefited and was affected in a timely manner. Since then, all dynasties have had a flood reporting system.
Hydrological forecast is an important basis for flood control and drought relief decision-making, rational use of water resources, ecological environmental protection, and operation and management of water conservancy and hydropower projects.Traditional hydrological forecasting methods mostly use process-driven hydrological models combined with hydraulics to simulate complex physical processes. However, high-quality physical data, complex mathematical tools and a large number of simplified assumptions pose challenges to calibration and verification.With the development of artificial intelligence technology and interdisciplinary subjects, many researchers have conducted in-depth research on artificial intelligence hydrological forecasting models.
In 2019, a research team from the State Key Laboratory of Water Resources and Hydropower Engineering Science of Wuhan University proposed a deep learning network that combines LSTM long short-term memory neural network with batch-size learning, regularization, and drop-out neuron, and applied it to the Three Gorges Reservoir flood forecast. From the comprehensive evaluation of the four indicators of forecast qualification rate, flood peak relative error, root mean square error, and benchmark fit, it can be seen that compared with the BPNN static neural network and the NARX dynamic neural network, the LSTM long short-term memory neural network combined with three deep learning auxiliary algorithms has effectively improved the forecast accuracy of the Three Gorges Reservoir flood.
In 2020, a research team from Northwestern Polytechnical University worked with the Yellow River Conservancy Research Institute to digitize the Yellow River Hydrological Yearbook and compile a variety of factors including soil, climate, topography and geology.The first systematic hydrological big data of the Yellow River Basin in China has been established.In terms of model algorithms, they have made breakthroughs in the single-site intelligent prediction model, pioneered the intelligent prediction model for site groups, and solved the flood prediction problem in areas with missing historical data, one of the ten major problems in the hydrological field, which has significantly improved the accuracy of flood prediction and extended the forecast period. The intelligent prediction algorithm has been successfully applied to the main sand-producing areas of the Loess Plateau, the uncontrolled areas between Sanmenxia and Huayuankou in the middle and lower reaches of the Yellow River, and Tangnaihai in the upper reaches of the Yellow River, significantly improving the flood forecasting capabilities.
In March 2024, Grey Nearing and his colleagues from the Google Research flood prediction team developed an AI model that can predict daily runoff in ungauged basins over a 7-day forecast period by training with 5,680 existing gauges. They then tested the AI model against the Global Flood Alert System (GloFAS), the world's leading short- and long-term flood prediction software.
The results show that the model's same-day prediction accuracy is comparable to or better than the current system. In addition, the model's accuracy in predicting extreme weather events with a return window of five years is comparable to or better than GloFAS's accuracy in predicting events with a return window of one year. The related research paper, titled "Global prediction of extreme floods in ungauged watersheds," has been published in the authoritative scientific journal Nature. (Click here for detailed report: Defeating the world's No.1 system and covering 80+ countries, Google's flood prediction model is published in Nature again)

Nowadays, smart water conservancy has been upgraded from the initial Internet of Things to the Intelligent Internet, that is, IoT devices collect data, AI analyzes and predicts based on the data, and feeds back the prediction results to relevant personnel in real time, so as to complete the evacuation of the masses and the protection of public property before the flood event. In the future, smart water conservancy based on the development of AI technology will continue to promote the intelligence of water conservancy planning, project construction, operation management and social services, improve the efficiency of water resource utilization and the ability to prevent floods and droughts, and improve the water environment and water ecology.
References:
1.https://mp.weixin.qq.com/s/sKPl55AEVf9GoXsLv0-8Hg
2.https://www.hanspub.org/journal/PaperInformation?paperID=28786