7 days ago

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

&#xd6, zsoy, Ege, Pellegrini, Chantal, Czempiel, Tobias, Tristram, Felix, Yuan, Kun, Bani-Harouni, David, Eck, Ulrich, Busam, Benjamin, Keicher, Matthias, Navab, Nassir

View Paper Details

MM-OR: A Large Multimodal Operating Room Dataset for Semantic
Understanding of High-Intensity Surgical Environments

Abstract

Operating rooms (ORs) are complex, high-stakes environments requiring preciseunderstanding of interactions among medical staff, tools, and equipment forenhancing surgical assistance, situational awareness, and patient safety.Current datasets fall short in scale, realism and do not capture the multimodalnature of OR scenes, limiting progress in OR modeling. To this end, weintroduce MM-OR, a realistic and large-scale multimodal spatiotemporal ORdataset, and the first dataset to enable multimodal scene graph generation.MM-OR captures comprehensive OR scenes containing RGB-D data, detail views,audio, speech transcripts, robotic logs, and tracking data and is annotatedwith panoptic segmentations, semantic scene graphs, and downstream task labels.Further, we propose MM2SG, the first multimodal large vision-language model forscene graph generation, and through extensive experiments, demonstrate itsability to effectively leverage multimodal inputs. Together, MM-OR and MM2SGestablish a new benchmark for holistic OR understanding, and open the pathtowards multimodal scene analysis in complex, high-stakes environments. Ourcode, and data is available at https://github.com/egeozsoy/MM-OR.