HyperAI超神经

Crowd Simulation is the process of simulating the movement of a large number of people in a specific situation. This technology is mainly used in computer games, urban planning, architectural design, and traffic organization. For example, simulating the movement of people in a building under different conditions (such as crowd density, flow, etc.) helps decision makers evaluate and optimize building design to improve emergency response and evacuation efficiency.

Although the field has made a lot of research progress and is developing rapidly, the influence of movement, sensory abilities and a series of psychological factors make individual behavior complicated in different situations. Due to the high computational complexity of such heterogeneous crowds, there are many different challenges that limit the realism of crowd simulation.

Researchers from the Center for Urban Science and Computational Research, Department of Electronic Engineering, Tsinghua University, Shenzhen Key Laboratory of Ubiquitous Data Empowerment, Tsinghua University Shenzhen International Graduate School, and Pengcheng Laboratory recently published a paper titled "Social Physics Informed Diffusion Model for Crowd Simulation" at AAAI 2024.A novel conditional denoising diffusion model SPDiff is proposed, which can effectively exploit the interaction dynamics to simulate crowd behavior through a diffusion process guided by social forces.

Inspired by the motion characteristics of multi-particle dynamical systems, the model also integrates strong inductive biases of equivariance to enhance the model's generalization ability to transformations, thereby achieving better performance. In addition, the model further develops a long-range training algorithm suitable for diffusion models to ensure the long-range physical consistency of model results. This method embeds social physics knowledge such as the social force model that describes the nature of human flow into the design of deep learning models, realizing a research paradigm driven by knowledge-data collaboration.

Paper link:

https://arxiv.org/abs/2402.06680

Code link:

https://github.com/tsinghua-fib-lab/SPDiff

Follow the official account and reply "People flow movement" to download the full paper

Heterogeneity and multimodality of crowd movement

Pedestrian mobility simulation is the process of microscopically simulating the movement of a large number of people in a specific scenario, focusing on the impact of group interaction on crowd movement. This technology has major applications in urban planning, architectural design, and traffic management. For example, realistic simulation of the movement of people at public transportation transfer stations (such as airports and train stations) helps analyze the efficiency and safety of transfer stations when facing large passenger flows, and further promotes the optimization of architectural space design.

Formula of social force model
Terminal driving force f_dest, pedestrian repulsion f_ped, the repulsive force f of the environment and obstacles_env

Crowd movement has two core characteristics: heterogeneity and multi-modality.

First, individual behaviors in a crowd are heterogeneous, and influenced by individual preferences and the surrounding environment, humans will produce complex spatiotemporal trajectories.For example, in a shopping mall, pedestrians move at different speeds and follow different paths depending on their personal interests and the layout of the mall. This results in people generating diverse and complex movement patterns that change over time, resulting in realistic trajectories.

Early research methods attempted to explain the mechanism behind pedestrian movement using models based on physical rules in the field of social physics, and then extract the essential characteristics of pedestrian movement from the heterogeneous characteristics, such as social force models. These methods have the problem that the simulated trajectories are not realistic and natural enough.

Second, the inherent uncertainty of human behavior leads to uncertainty in pedestrian trajectories, which is often referred to as the multimodality of human mobility.Early studies made simplifying assumptions on the random distribution of trajectories, such as using Gaussian distribution to model multimodality, and subsequent methods used generative models such as generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate multimodal samples.

In recent years, the diffusion model, as a popular generative model, has demonstrated state-of-the-art performance in many generative tasks. In order to achieve realistic simulation, this study comprehensively considered the two aspects discussed above that need attention, hoping to utilize the excellent performance of the diffusion model in modeling complex multimodal distributions, and use social physics knowledge represented by the social force model to guide the design of the diffusion model framework of this study.

Diffusion model + multi-frame deduction training algorithm: Realizing long-range movement simulation

Different from the diffusion model that gradually reconstructs the distribution of observation data, the social force model transforms the movement of the crowd into a multi-particle dynamical system and directly imposes physical constraints on the observation data of each pedestrian in each time frame.Therefore, incorporating this knowledge into operations on noisy data during the denoising process is difficult.

Meanwhile, pedestrian mobility simulation involves data generation tasks for multiple pedestrians and multiple time frames. Existing methods usually use diffusion models to generate the entire sequence at once. However, in the problem of this work, generating the entire simulation trajectory at once cannot effectively guide each pedestrian in each time frame in combination with the social force model.

Furthermore, due to the high-dimensional nature of the generated data, one-shot generation may suffer from efficiency and effectiveness issues.For existing diffusion model frameworks, achieving long-term simulations while maintaining the stability of the simulation results is a challenging problem.

To address the above challenges, this study proposes a conditional denoising diffusion model for pedestrian mobility simulation. The model has the following features:

* Includes a crowd interaction module to gain insights from social force models to guide the denoising process;

* Integrates equivariant properties derived from multi-particle dynamical systems, enhancing the generalization of the model across transformations and optimizing data efficiency.

As shown in the figure,SPDiff uses a graph network to model the scene.In the graph, each pedestrian establishes interactive relationships with nearby pedestrians and obstacles in the field of view through directed edges. The proposed diffusion model uses the node and edge information of the graph, the historical state and the pedestrian's destination information as conditional inputs, and uses the diffusion model to sample the distribution of pedestrians' future acceleration in the next time frame, and then updates the state of all pedestrians at the next moment. This process can be iterated to achieve behavior simulation of any duration.

In the design of the diffusion model denoising network, in order to integrate the physical knowledge of human movement into the diffusion model, we built the neural network model on the basis of the original social force model and replaced its core terms and. The traction force at the end point can be directly calculated by the formula,On this basis, the Graph Network (GN) algorithm is used to realize the process from pedestrian status to social force prediction.

In addition, pedestrian interaction is equivariant, that is, the interaction will undergo the same transformation or remain unchanged with the transformation of the particle-like system composed of pedestrians (such as translation and rotation). In order to incorporate such physical characteristics,The interaction information is processed through a series of equivariant graph convolutional layers (EGCL) to improve the training efficiency and physical consistency of the model.

Finally, the historical movement state of each pedestrian is processed by a long short-term neural network (LSTM). The introduction of the history processing module is attributed to the prior cognition that humans tend to avoid excessive changes in movement states to save energy.

Design of parameterized denoising neural network for diffusion model

In order to achieve long-range motion simulation with physical consistency, this work further designs a multi-frame deduction training algorithm.As shown in the figure below, during the training process, the diffusion model simulates trajectories within a defined time window and calculates the cumulative error as the loss function to update the model parameters by gradient descent. This learning process penalizes the model's short-sighted behavior of ignoring physical consistency in long-range simulations, so that the model can generalize to long-range simulations.

Schematic diagram of the proposed multi-frame deduction training algorithm

Experimental results: Only 5% training data is needed to achieve optimal performance

In order to evaluate the effectiveness of this model,This study introduces two real-world datasets: GC data and UCY dataset.The two datasets differ in scenes, scales, durations, and pedestrian densities, and can be used to validate the generalization performance of the model.

The study classified the baseline methods into three categories:

* Physics-based methods (Social Force Model SFM, Cellular Automata CA)

* Purely data-driven methods (STGCNN, PECNet, MID)

* Methods of physical knowledge integration (PCS, NSP)

Comparative experiments verify that the proposed method has significant performance improvements over the most advanced baseline methods.Judging from the micro indicators (MAE, DTW) and macro simulation authenticity indicators (OT, MMD), the improvement ranges from 6% to 37%.

Performance comparison of the main experiment UCY dataset
The bolded part is the best performance, and the underlined part is the second best performance

In order to further explore the simulation accuracy performance in each time frame, this experiment examines the change of indicators with the simulation time frame. It can be seen that the indicators will show oscillating changes with time, that is, multi-peak phenomenon. The rise can be attributed to the accumulated errors during the long-term simulation, and the decline can be attributed to the fact that all three models "pull" pedestrians to the end point.

Overall, the proposed method can maintain a lower error for a long time compared to the other two baselines, which reflects the simulation accuracy of this method.

Indicator evolution on simulated timeframe

Using the UCY and GC datasets, using OT and MMD as indicators

This study further explored the contribution of each key design in the method to the performance improvement, and verified the performance of the model without social-physical knowledge fusion, without a history processing module, and without a multi-needle deduction training algorithm.

The experimental results in the figure below show thatThe removal of any one component leads to some degree of degradation in model performance, which proves the effectiveness of each group of designs.It is noted that the model performance loss is the largest when the design related to social physics guidance is removed, reflecting the necessity of incorporating social physics knowledge in crowd simulation.

Ablation experiments of different modules of the model. NC means non-convergence

Finally, this paper studies the impact of the inductive bias introduced by equivariant design in the crowd interaction module on performance. When the equivariant graph convolutional layer is degraded to a non-equivariant network, the performance changes of the model under different training data amounts and training cycles are explored. As shown in the figure,The model using equivariant graph neural network consistently outperforms the non-equivariant model in almost all training sample ratios. Even when using only 5% of the training data, the original model still maintains excellent performance.

Specifically, when the training sample ratio is 5%, the MAE index of SPDiff has almost no decrease compared with the training sample ratio of 100%, and the maximum decrease is only 2.5%. Compared with the non-equivariant design, the equivariant design improves the MAE index by up to 13.2% and the OT index by up to 22%. This shows that thanks to the help of the equivariant design, the model proposed in this paper can obtain the generalization ability equivalent to that after training with a large amount of data with only a small number of samples.

Changes in MAE under different training sample ratios

Conclusion

This paper proposes a new method for simulating pedestrian movement based on a conditional denoising diffusion model. Through a physically guided conditional diffusion process, the model can effectively utilize the known state information of crowd movement to simulate pedestrian movement.

Inspired by the famous social force model, the proposed equivariant crowd interaction design and multi-frame deduction training algorithm solve the challenges of macro and micro simulation authenticity and long-range simulation stability. This method introduces generative modeling into the research on human flow and explores the combination of social physics knowledge and generative deep models.

Call to action

HyperAI is one of the earliest open communities focusing on AI for Science. It continues to share and promote the latest research results by interpreting cutting-edge papers at home and abroad.

Research groups and teams that are conducting research and exploration around AI for Science are welcome to contact us to share their latest research results, submit in-depth interpretation articles, etc. More ways to promote AI4S are waiting for us to explore together!

Add WeChat: HyperaiXingXing (WeChat ID: Hyperai01)

Only 5% Training Samples Are Needed to Achieve Optimal Performance. Tsinghua University Research Team Released the Conditional Denoising Diffusion Model SPDiff to Achieve Long-range Human Flow Simulation

Heterogeneity and multimodality of crowd movement

Diffusion model + multi-frame deduction training algorithm: Realizing long-range movement simulation

Experimental results: Only 5% training data is needed to achieve optimal performance

Conclusion