HyperAI

Multi-field Geoscience Applications: Zhejiang University Team Proposed a Series of GeoAI Methods to Help Spatiotemporal Modeling and Prediction in the Fields of Geography, Oceanography, Geology, and Atmosphere

特色图像

As a highly interdisciplinary field, earth science is undergoing a major transformation led by AI. By mining potential information and discovering hidden patterns in massive earth science data, AI can not only deepen people's understanding of the earth's natural phenomena, but also optimize researchers' modeling and prediction of the spatiotemporal nonlinear relationship between different factors in earth science, and promote the formation of a new research paradigm.

Recently, at the COSCon'24 AI for Science forum jointly produced by HyperAI,Qi Jin, a researcher from the School of Earth Sciences at Zhejiang University, shared with everyone the limitations of traditional geographical modeling and the impact of AI-enabled traditional methods on housing price analysis, ocean remote sensing, air pollution, mineralization prediction and other fields under the topic of "GeoAI and its interdisciplinary geoscience applications".

Teacher Qi Jin's speech

HyperAI has compiled and summarized Mr. Qi Jin’s in-depth sharing without violating the original intention. The following is the transcript of the speech.

GeoAI's interdisciplinary applications cover housing price analysis, ocean remote sensing, air pollution, and mineralization prediction

With the continuous advancement of observation technology, spatiotemporal data in the field of earth science has exploded. These data can be widely used in scientific research such as marine environmental modeling, housing price cause analysis, mineral spatial distribution exploration, and PM2.5 air pollution simulation.

In the past, we used the traditional Geographically Weighted Regression (GWR) model to analyze the impact of geographic location on the relationship between variables to analyze or predict the spatial heterogeneity of the target object. However, there are complex interactions between different data.How to build more sophisticated model structures and modeling objects of more scales has become an important challenge.

In order to adapt to the development of artificial intelligence and big data and deal with complex modeling problems in the real world,We combine the concept of traditional geographical weighted regression with neural network technology and propose a new type of model, including geographic neural network weighted regression (GNNWR) and geographic spatiotemporal neural network weighted regression (GTNNWR).

Since the publication of the first paper, GNNWR, GTNNWR and other series of methods have attracted much attention and have been widely used in oceanography, geography, atmospheric science and geology. More than 30 related papers have been published. These achievements are not limited to the methodological research and application research published by our team. There are also many external teams that use similar modeling ideas or technical architectures to conduct research. GNNWR is now open source on GitHub and supports direct calls to pip install gnnwr (Python ≥ 3.9).

GNNWR open source address:
https://github.com/zjuwss/gnnwr

Taking house price analysis as an example,As we all know, housing prices are significantly affected by geographical location. Tourist attractions, school district locations, etc. directly affect housing prices. Geography is to reveal which factors can affect housing prices through statistical analysis. Compared with traditional regression models, the GNNWR model not only has higher fitting accuracy, but also has stronger interpretability, and can deeply reveal the mechanism of action and spatial differences of factors affecting housing prices. The specific content of this study will be introduced in detail later.

Original paper:
https://www.mdpi.com/2220-9964/11/8/450

https://www.tandfonline.com/doi/full/10.1080/13658816.2024.2343771

In terms of marine ecological environment modeling,The ocean images acquired from space by remote sensing satellites contain rich band information. Based on the different spatial distribution of this band information, we can analyze the content of marine ecological elements such as chlorophyll and suspended sediment.

In recent years, silicate, an important nutrient in the ocean, can also be estimated in terms of its temporal and spatial distribution through the GTNNWR model. The reduction of silicate can lead to the occurrence of coastal red tides. The GTNNWR model can be used to obtain the fine-scale temporal and spatial dynamic changes of dissolved silicate in coastal waters, thereby providing remote sensing early warning signals for the occurrence of coastal red tides. The specific content of this study will be described in detail later.

Another example is PM2.5 pollution.Some heavy industrial cities in the north may be the main source of pollution. The GNNWR model can establish spatial non-stationary regression relationships, estimate PM2.5 concentrations, and provide high-precision and detailed PM2.5 distribution across the country. For example, through geospatial modeling, we found that PM2.5 concentrations are generally high from Beijing to Lianyungang, which may be affected by factors such as wind direction and wind speed. In addition, shelterbelts in specific areas may inhibit the spread of PM2.5.

Original paper:
https://www.mdpi.com/2072-4292/13/10/1979

In the field of geology, especially in the prediction of spatial distribution of gold deposits,We have conducted a series of studies to reveal the impact of geological factors on the probability of gold deposit formation. In the constructed model, we introduced the Shapley method to enhance the interpretability of the model and achieve accurate prediction and interpretation of mineralization in complex spatial environments.

More details: Better than the five advanced models, the GNNWLR model proposed by Du Zhenhong's team at Zhejiang University: improving the accuracy of mineralization prediction

Using hamburger prices as an example to explore the limitations of traditional geographic modeling

In the traditional statistical field, if we want to explore the factors that affect PM2.5 concentration, we usually use multiple linear regression analysis, that is, x represents the independent variable, y represents the dependent variable, and explores the relationship between y and x. However,In the field of geographical research, considering the differences in relationships between variables caused by spatial location, traditional statistical methods find it difficult to model such complex natural phenomena.

Take the price of hamburgers as an example. Let y be the price of hamburgers. The price of hamburgers in Beijing is 25 yuan, while that in Hangzhou is 15 yuan. If we use simple linear modeling, from a geographical point of view, Jiangsu is located between Beijing and Hangzhou, so we may predict that the price of hamburgers in Jiangsu is 20 yuan. However, geographical factors are not such a simple linear relationship. The price of hamburgers is also affected by multiple factors such as logistics costs, traffic conditions and raw material costs. The spatial distribution of these factors varies.This means that the weights of various factors at different geographic locations should be considered when modeling.

To further address the problem of modeling geographic relationships, geographers extended traditional multiple linear regression to geographically weighted regression (GWR).In GWR, the regression coefficient β before each independent variable is given the characteristic of varying with geographical location.That is, the weight of each regression coefficient changes with the change of spatial position. This change is what we often call "spatial non-stationarity", which means that the relationship between the independent variable and the dependent variable is not a stable linear relationship, but fluctuates.

How to calculate the geographically weighted regression coefficient? The core includes two points. The first is to calculate an accurate spatial distance, and the second is to select the most accurate fitting function among many kernel functions.

In terms of spatial distance calculation, in addition to Euclidean distance, there are also Manhattan distance calculations. Assuming that Hangzhou is 200 kilometers from Nanjing and Beijing is also 200 kilometers from Nanjing, if calculated based on Euclidean distance, the straight-line distance between the two places can be calculated using the Pythagorean theorem. However, in actual applications, Dalian and Yantai may only take about 100 kilometers by boat, while taking the high-speed rail requires a longer detour, and the actual distance may exceed 300 kilometers.Therefore, in geospatial modeling, the choice of distance calculation method is crucial.

Secondly, we introduced the concept of "kernel function" and drew a hill graph of the kernel function, as shown in the figure below. The farther away from the analysis point (red dot), the lower the weight, but this relationship is not a simple decreasing relationship, but fluctuates with the spatial distance.When geographers build models, they have many choices of weight kernel functions, such as Gaussian functions, exponential functions, etc.

In summary, the uncertainty of spatial distance measurement and the choice of kernel function to best fit the data are the main issues affecting the accuracy of geographic modeling.

Merging Traditional Geographic Modeling with AI

Complex nonlinearity is an inherent characteristic between different factors in the real world. Machine learning and neural networks were born to solve such problems.

In the field of geographic modeling, the spatial distance between two points is often nonlinear, and the weights described by the kernel function also change nonlinearly. Therefore, we combine the traditional Geographically Weighted Regression (GWR) concept with neural network technology.A new class of method models is proposed, including geographic neural network weighted regression (GNNWR) and geographic spatiotemporal neural network weighted regression (GTNNWR).

Related papers:

https://www.tandfonline.com/doi/full/10.1080/13658816.2019.1707834

https://www.tandfonline.com/doi/full/10.1080/13658816.2020.1775836

https://www.tandfonline.com/doi/full/10.1080/13658816.2022.2100892

This method has two major features: First, a neural network is built specifically for calculating spatial distance. Whether the actual distance is 100 kilometers or 300 kilometers, the neural network can determine the most suitable distance between two points for modeling through big data. Second, this method designs a spatiotemporal weight network, namely a spatial weighted neural network, which is responsible for calculating the output weight value based on the input spatial distance.In this process, we do not need to determine which kernel function to use in advance. Instead, the neural network learns the data features by itself and automatically constructs the geographic weights based on them. Through the nested application of the above two neural networks, the corresponding variable y can be accurately predicted.

Different from traditional methods, GNNWR can accurately calculate the coefficient β in front of the independent variable.For a more intuitive display, we visualize the regression coefficient β, as shown in the following figure. The weight distribution is orange diamond, β It shows a unique distribution pattern with high weights on the top and bottom and low weights in the middle, while β It presents a central circular distribution.

As shown in the figure below,The accuracy of GWR combined with neural network on both training set and test set is significantly improved.

Application of GNNWR in housing price and marine ecological environment modeling

House prices are not only related to the workplace, but also need to consider factors such as transportation, school district, and environment.In the housing price modeling, we took Wuhan housing prices as an example and collected data from nearly 1,000 second-hand housing transaction records, which were divided into training and test sets at a ratio of 85:15. Second-hand housing was chosen because it is less affected by policy regulation and is closer to the actual economic flow effect.

During the research, we followed the conventional neural network modeling process, divided the test set and the training set, and collected a series of variables that may affect housing prices. The feature of this case is that it introduces a new concept of "spatial distance". In addition to the traditional Euclidean distance, we also proposed "commuting distance" based on actual traffic conditions. By establishing a distance fusion function,We input the commuting distance and the Euclidean distance into the neural network together to determine the nonlinear distance after the fusion of the two.

The overall structure of the model has not been significantly changed. It also inputs the corresponding weight w of each factor and outputs the final house price y. Through comparative experiments, we prove thatWhen considering both Euclidean distance and commuting distance, the model performance is 12% higher than traditional modeling, which is higher than the improvement when a single distance is input into the neural network separately.

The study also revealed the correlation between housing prices in Wuhan and the distribution of university towns, research institutes, technology companies and tourist attractions.In addition, the proposed model is particularly effective in predicting housing prices in areas far from the city center. Specifically, as the distance from the city center increases, the prediction accuracy of the model also increases. This shows that in urban fringe areas, special distance measurement methods can more accurately capture the law of housing price changes.

In terms of marine ecological environment modeling,Take the Three Gorges Dam on the Yangtze River as an example. The dam intercepts silt and makes the water clearer, but it also blocks an important nutrient from entering the ocean - silicate. The reduction of silicate will lead to an increase in the proportion of toxic and harmful red tides along the coast. Traditional research methods roughly estimate the flow trend of nutrients by drawing contour maps. However, in the context of the new era, how to use remote sensing satellite images with high temporal and spatial resolution to explore the distribution of nutrients has become a new topic. In this regard, we proposed a nonlinear modeling idea based on GeoAI, hoping to give full play to the advantages of big data and realize the analysis of marine nutrients.

This study used the GNNWR method independently developed by the team. The characteristics of this method are shown in the figure below. In addition, we also performed operations such as data set matching, remote sensing spatiotemporal estimation, and missing data completion.

During the research process, we cooperated with the Zhejiang Provincial Marine Monitoring Management Department, used the monitoring data it released, and combined with the famous API Google Earth Engine Map to download the required remote sensing images. We then defined its time, spatial location and resolution, and divided it into training set, test set and validation set according to standard procedures. We implemented 10-fold cross-validation and selected the best and most stable results for modeling.

Through modeling, we mapped out the temporal and spatial distribution changes of daily silicate in Zhejiang Ocean for the past nine years. It was observed that in August each year, due to the frequent activities of marine organisms and plants, the silicate content was low. In September and October, due to the flow of water from the Yangtze River to the coastal waters of Zhejiang, the nutrient content in the area increased significantly.

As shown in the figure below, the blue curve is the silicate content, and the orange curve is the flow direction and velocity of the Yangtze River. We can see that there is a significant correlation between the silicate content and the distribution of the Yangtze River water flowing through Zhejiang, with the Pearson coefficient reaching 0.462.This proves that the impact of the Yangtze River waters on Zhejiang waters is more obvious in autumn and winter every year.

In addition, we also used high temporal and spatial resolution data to analyze changes in marine biological activity. The study found that during the red tide in the coastal waters of Zhejiang, the relevant curve dropped twice within two weeks, indicating thatThe AI method can not only improve model accuracy, but also reveal subtle changes in time and space, or provide important signals for real-time monitoring and early warning of diatom blooms.

Regarding the impact of coastal typhoons, we noticed that nutrient levels peaked on the day the typhoon reached the ocean and returned to their original levels three days later.This phenomenon is attributed to the subsurface seawater disturbance caused by typhoons, which causes seabed nutrients to be brought from the deep to the sea surface. However, after the typhoon, the nutrient content quickly returns to its original state, confirming the mechanism inferred from traditional oceanographic research from a data-driven perspective.

In summary,This study provides a prediction signal for the early warning of offshore red tides and verifies the impact of typhoons on the temporal and spatial changes of the ocean. The team has published a series of papers in the field of oceanography to explore the changes in the temporal and spatial distribution of ocean water quality, and may form a new research direction in the future.

About Zhejiang University School of Earth Sciences

The guest speaker of this sharing session, Mr. Qi Jin, is from the School of Earth Sciences, Zhejiang University.His research interests include artificial intelligence oceanography and geoscience big data analysis platform development. He has presided over a number of important scientific research projects, including the "14th Five-Year Plan" National Key R&D Program sub-projects and National Natural Science Foundation projects. He has served as the technical director of the Zhejiang Coastal Ecological Environment Multi-source Information Intelligent Service Platform and won the first prize of the Marine Engineering Science and Technology Award.

Qi Jin's personal homepage:

https://person.zju.edu.cn/qijin

His team, led by Professor Du Zhenhong and Professor Wu Sensen, has achieved a series of results in the fields of geoscience and information science in recent years.The GNNWR series of models proposed by the team are widely used by industry talents, and the number of downloads, calls and citations of the models has exceeded 10,000. In the future, the team is committed to fully developing GIS theory and methods, geoscientific intelligent analysis platform technology, and continuing to explore the development of GeoAI.

GNNWR research team leader Wu Sensen's personal homepage and a brief introduction to the spatiotemporal intelligent regression model:

https://mypage.zju.edu.cn/wusensen/#977161

The team is recruiting postdoctoral fellows and research assistants. We welcome researchers with backgrounds in GIS, remote sensing, geography, oceanography, geology, computer science and technology to join us. We also welcome outstanding young people from overseas and all kinds of high-level talents to join us!