HyperAI

Cities are the homeland where people live and work in peace and contentment, the cornerstone of the government's economic development, and bear delicate humanistic feelings and a grand national development context. For a long time, managers have been exploring more efficient and scientific urban governance methods to solve problems such as unbalanced resource supply, traffic congestion, and population loss in different regions. With the accelerated iteration of technologies such as the Internet of Things, AI, and big data, smart cities have emerged, and more and more countries have begun to innovate in accordance with local conditions.

If the construction of smart cities is likened to "building a house", then spatiotemporal data is the indispensable "bricks and tiles", and the spatiotemporal prediction model based on spatiotemporal data is an important foundation for the smart city framework. Spatiotemporal data, as the name suggests, records the occurrence and changes of events in two dimensions of time and space, including geographic information, meteorological data, traffic data, population data, satellite remote sensing data, etc.

However, due to the different levels of urban development and data collection policies, some cities lack spatiotemporal data, making it difficult to support the construction of prediction models. In response to this, existing methods mainly use data from data-rich source cities to train models and apply them to target cities with scarce data. However, this process often relies on complex matching designs, and how to achieve more generalized knowledge transfer between source and target cities remains an important challenge.

In view of the widespread data scarcity problem in urban computing,The Center for Urban Science and Computing Research, Department of Electronic Engineering, Tsinghua University, released its latest research results, "Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation", and proposed the GPD (Generative Pre-Trained Diffusion) model.The diffusion model is used to generate neural network parameters, transforming spatiotemporal few-shot learning into a pre-training problem for the diffusion model. This research has been accepted by ICLR2024, and the data and code have been open sourced.

Its advantage is that by pre-training a diffusion model, knowledge about optimizing neural network parameters is learned from the data of the source city, and then a neural network adapted to the target city is generated based on the prompts.

Paper link:
https://openreview.net/forum?id=QyFm3D3Tzi
Dataset download link:
https://hyper.ai/datasets/30453

Follow the official account and reply "Tsinghua GPD" to download the full paper

Crowd and traffic dataset covering multiple cities

The researchers conducted experiments on two types of spatiotemporal forecasting tasks - crowd flow forecasting and traffic speed forecasting.

In crowd flow prediction, the researchers conducted experiments on three real-world datasets, including New York City, Washington, D.C., and Baltimore. Each dataset contains hourly urban pedestrian flows in all regions.

3 Real-World Datasets for Crowd Flow Prediction

In traffic speed prediction, the researchers conducted experiments on 4 real-world datasets, including MetaLA, PEMS-BAy, Didi Chengdu, and Didi Shenzhen.

4 Real-World Datasets for Traffic Speed Prediction

In the two test tasks mentioned above, the researchers classified the datasets into source cities and target cities. For example, if a specific city is set as the target dataset, it is assumed that only a limited amount of data, such as 3 days of data, is accessible (existing models usually require several months of data to train the model), and the diffusion model is trained using the rich data provided by the source city.

Double buff: pre-training + prompt fine-tuning

As shown in the figure below, GPD, as a conditional generation framework, is divided into three key stages:

(a) Neural network preparation stage

The researchers trained a separate spatiotemporal prediction model for each source city region and saved its optimized network parameters. The model parameters for each region were independently optimized and converted into a vector-based format without parameter sharing to ensure that the model can best adapt to the characteristics of the respective region.

(b) Diffusion model pre-training

The framework uses the collected pre-trained model parameters as training data to train the diffusion model to learn the process of generating model parameters. The diffusion model generates parameters by stepwise denoising, which can generate neural network parameters from noise given a prompt. This process is similar to the parameter optimization process starting from random initialization, so it can better adapt to the data distribution of the target city.

(c) Neural network parameter generation

After pre-training, parameters can be generated by using region cues of the target city. This approach promotes knowledge transfer and accurate parameter matching using cues, making full use of the similarities between regions among cities.

Among them, the network structure of the denoising network is shown in the following figure:

As shown in the figure above (a), the denoising network architecture of this study adopts a prompt-based Transformer diffusion model. After layer segmentation, the parameters are reorganized into a labeled sequence.

In the denoising process, in addition to the noise sequence, the Transformer diffusion model also considers the time step k and the regional hint p. The researchers explored a variety of adjustment methods, such as post-adaptation adjustment and adaptive norm adjustment, and made minor but important modifications to the Transformer layer design. The adjustment strategies are shown in Figures (b) and (c) above.

It is worth mentioning thatIn the pre-training-cue-fine-tuning framework, the choice of cues is highly flexible as long as they can capture the characteristics of a specific region.For example, various static features can be utilized, such as population, regional area, functions, and distribution of points of interest (POIs).

This work exploits regional cues from both spatial and temporal perspectives:

* Spatial cues come from node representations in the city knowledge graph, using only relationships such as regional adjacency and functional similarity, which are easy to obtain in all cities;

* The time cues come from the encoder of the self-supervised learning model.

GPD performs well in data-scarce scenarios, with a performance improvement of 7.87%

In order to evaluate the effectiveness of the proposed framework, the study conducted experiments on two types of classic spatiotemporal prediction tasks: crowd flow prediction and traffic speed prediction. The study showed thatGPD performs well in data-scarce scenarios, improving on average 7.87% over the best baseline on four datasets.

Comparison of GPD against state-of-the-art baselines on 4 datasets

In the Washington, DC, Baltimore, Los Angeles, and Chengdu datasets, GPD's mean absolute error (MAE) is reduced by 4.31%, 17.1%, 2.1%, and 8.17%, respectively, compared to the best baseline method marked with lines in the table above. This shows thatGPD consistently performs well in different data scenarios and achieves effective knowledge transfer at the neural network parameter level.

Performance comparison of different spatiotemporal prediction models

In addition, this study also verified the flexibility of the GPD framework in adapting to different spatiotemporal prediction models. In addition to the classic spatiotemporal graph method STGCN, this study also introduced GWN and STID as spatiotemporal prediction models and used the diffusion model to generate their network parameters. Experimental results show thatThe superiority of the framework is not affected by the model selection, so it can be adapted to various advanced models.

Accelerate the creation of "Realistic 3D China"

In recent years, with the acceleration of the construction of emerging infrastructure, the problem of collecting spatiotemporal data has been alleviated to a large extent. In addition, with the successful application of few-sample learning methods such as the above-mentioned ones, urban spatiotemporal big data platforms that are adapted to local conditions have been implemented in more and more cities.

A report in People's Daily in May 2023 showed that the self-guarantee rate of domestic satellite remote sensing images reached more than 90%; the 1:50000 basic geographic information database was dynamically updated on an annual basis, and the land coverage rate of 1:10000 basic geographic information data reached 65%.

According to reports, the real-life 3D China has been incorporated into the overall layout plan for the construction of Digital China and has been fully launched. The state and provinces, cities and counties have jointly promoted the construction of real-life 3D at the terrain, city and component levels, and the product coverage has extended from the land surface to the ocean, underwater and underground. At present, the construction results of the real-life 3D China have been connected to the basic information platform of national land space in real time, and are used in the verification of the data reported by the third national land survey, the extraction of change spots in the land change survey, the demonstration and program deduction of the compilation of national land space planning, etc.

By May 2023, 40 smart city spatiotemporal big data platforms will be completed.It has developed more than 400 industry application systems for natural resource monitoring and management, urban refined management, transportation and market supervision, etc., providing real-time, rich, comprehensive and authoritative spatial and temporal infrastructure support for urban refined management, economic development and public life.

What is certain is that under the general background of "Digital China", the construction of smart cities with the goals of livability and sustainable development will continue to deepen, and the importance of spatiotemporal data and spatiotemporal models as the foundation of the city's brain is self-evident. It is believed that with the improvement of data collection capabilities and the iteration of few-sample learning methods, spatiotemporal predictions will become more accurate.

References:
https://www.gov.cn/lianbo/bumen/202305/content_6874554.htm

Based on real-life Data From Seven Major Cities, the Tsinghua University Team open-sourced the GPD Model

Crowd and traffic dataset covering multiple cities

Double buff: pre-training + prompt fine-tuning

GPD performs well in data-scarce scenarios, with a performance improvement of 7.87%

Accelerate the creation of "Realistic 3D China"