YOLOv10: Real-Time End-to-End Object Detection

Over the past years, YOLOs have emerged as the predominant paradigm in thefield of real-time object detection owing to their effective balance betweencomputational cost and detection performance. Researchers have explored thearchitectural designs, optimization objectives, data augmentation strategies,and others for YOLOs, achieving notable progress. However, the reliance on thenon-maximum suppression (NMS) for post-processing hampers the end-to-enddeployment of YOLOs and adversely impacts the inference latency. Besides, thedesign of various components in YOLOs lacks the comprehensive and thoroughinspection, resulting in noticeable computational redundancy and limiting themodel's capability. It renders the suboptimal efficiency, along withconsiderable potential for performance improvements. In this work, we aim tofurther advance the performance-efficiency boundary of YOLOs from both thepost-processing and model architecture. To this end, we first present theconsistent dual assignments for NMS-free training of YOLOs, which bringscompetitive performance and low inference latency simultaneously. Moreover, weintroduce the holistic efficiency-accuracy driven model design strategy forYOLOs. We comprehensively optimize various components of YOLOs from bothefficiency and accuracy perspectives, which greatly reduces the computationaloverhead and enhances the capability. The outcome of our effort is a newgeneration of YOLO series for real-time end-to-end object detection, dubbedYOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-artperformance and efficiency across various model scales. For example, ourYOLOv10-S is 1.8$\times$ faster than RT-DETR-R18 under the similar AP on COCO,meanwhile enjoying 2.8$\times$ smaller number of parameters and FLOPs. Comparedwith YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters forthe same performance.