Command Palette
Search for a command to run...
OST-Bench Spatiotemporal Scene Understanding Benchmark Dataset
Date
Paper URL
License
Non-Commercial
OST-Bench, released in 2025 by the Shanghai Artificial Intelligence Laboratory in collaboration with Shanghai Jiao Tong University, the University of Hong Kong, and other institutions, is a dataset used to evaluate the online spatiotemporal scene understanding capabilities of multimodal large models. The related research paper is titled "OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene UnderstandingThe goal is to evaluate the comprehensive understanding capabilities of multimodal large models in online scene exploration, visible information modeling, and spatiotemporal reasoning tasks.
This dataset comprises approximately 1,400 real-world indoor 3D scenes, generating about 10,000 multi-turn temporal question-and-answer samples based on the scene exploration process. The scenes are sourced from ScanNet, ARKitScenes, and Matterport3D, and processed using unified 3D object and semantic annotations. A continuous viewpoint exploration trajectory is constructed within each scene, and corresponding question-and-answer content is generated based on accumulated visible information. The task design covers three core understanding directions: agent state, visible information, and agent-object spatial relationships, refined into 15 sub-tasks presented in a multi-turn dialogue format, requiring the model to perform online spatiotemporal reasoning based on historical observations and the current field of view.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.