CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Context-aware decision support in the operating room can foster surgicalsafety and efficiency by leveraging real-time feedback from surgical workflowanalysis. Most existing works recognize surgical activities at a coarse-grainedlevel, such as phases, steps or events, leaving out fine-grained interactiondetails about the surgical activity; yet those are needed for more helpful AIassistance in the operating room. Recognizing surgical actions as triplets of combination delivers comprehensive details about theactivities taking place in surgical videos. This paper presentsCholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 forthe recognition of surgical action triplets in laparoscopic videos. Thechallenge granted private access to the large-scale CholecT50 dataset, which isannotated with action triplet information. In this paper, we present thechallenge setup and assessment of the state-of-the-art deep learning methodsproposed by the participants during the challenge. A total of 4 baselinemethods from the challenge organizers and 19 new deep learning algorithms bycompeting teams are presented to recognize surgical action triplets directlyfrom surgical videos, achieving mean average precision (mAP) ranging from 4.2%to 38.1%. This study also analyzes the significance of the results obtained bythe presented approaches, performs a thorough methodological comparison betweenthem, in-depth result analysis, and proposes a novel ensemble method forenhanced recognition. Our analysis shows that surgical workflow analysis is notyet solved, and also highlights interesting directions for future research onfine-grained surgical activity recognition which is of utmost importance forthe development of AI in surgery.