CVPR2025

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Ege Özsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, Nassir Navab

摘要

Multiview RGB-D Video (5 cameras) Detail RGB Videos (3 cameras) Low Exposure RGB Video Point Cloud Sound Robot Screen, Tracker Data and Logs Panoptic Segmentations Scene Graphs Downstream Tasks Robot Phase: Robot Preparation Complete head surgeon operating table sa wi ng saw holdi ng patient robot ma nip ula tin g mps lying on nurse instrument table preparing c lo s e to Next Action: Hammering Sterility Breach: No Figure 1. Overview of a single timepoint in MM-OR, illustrating the multimodal data provided for each sample: RGB-D video from multiple angles, detailed RGB views, low-exposure video, point cloud data, robot screen and tracker logs, audio and speech transcripts, panoptic segmentations, semantic scene graphs, and downstream task annotations such as robot phase, next action, and sterility breach status.