Frontiers in Intelligent Colonoscopy

Colonoscopy is currently one of the most sensitive screening methods forcolorectal cancer. This study investigates the frontiers of intelligentcolonoscopy techniques and their prospective implications for multimodalmedical applications. With this goal, we begin by assessing the currentdata-centric and model-centric landscapes through four tasks for colonoscopicscene perception, including classification, detection, segmentation, andvision-language understanding. This assessment enables us to identifydomain-specific challenges and reveals that multimodal research in colonoscopyremains open for further exploration. To embrace the coming multimodal era, weestablish three foundational initiatives: a large-scale multimodal instructiontuning dataset ColonINST, a colonoscopy-designed multimodal language modelColonGPT, and a multimodal benchmark. To facilitate ongoing monitoring of thisrapidly evolving field, we provide a public website for the latest updates:https://github.com/ai4colonoscopy/IntelliScope.