Online Tutorial | Object Detection Enters the Era of "Global Awareness": Tsinghua University and Others Release YOLOv13, Achieving Breakthroughs in Both Speed and Accuracy

In applications requiring "millisecond-level response," such as autonomous driving, industrial quality inspection, and security monitoring, real-time object detection remains an extremely challenging technological field. Over the past decade, the YOLO series has become the mainstream solution in this field due to its lightweight and efficient architecture. From the initial YOLO to the recent YOLOv11 and YOLOv12, the model has continuously sought new balance points between speed and accuracy.
However, even after multiple evolutions,The underlying mechanisms of the YOLO series still face common bottlenecks:Either it can only perform local aggregation within a fixed receptive field, like convolution, or it can expand the receptive field like self-attention, but due to its high computational cost, it has to be "regionalized" in actual deployment, thus losing a true global perspective. More importantly,Self-attention is essentially still modeling the relationship between pairs of pixels, and can only express "binary correlations", making it difficult to capture more complex many-to-many semantic structures in the scene.These structures are crucial for models to understand crowded scenes, fine-grained objects, or highly complex visual relationships.
*Receptive field: In the visual pathway, photoreceptors (rod cells and cone cells) on the retina receive light signals, convert them into neural signals, and affect the lateral geniculate nucleus cells and ganglion cells in the visual cortex. The stimulated areas of these ganglion cells are called receptive fields. Different types of senses have different receptive field properties and sizes.
This is why the traditional YOLO architecture often encounters performance bottlenecks when facing complex scenarios: it either fails to fully understand long-range dependencies or struggles to express deep semantic relationships across scales.
In response to this long-standing problem,A joint research team comprised of Tsinghua University, Taiyuan University of Technology, Xi'an Jiaotong University, and other universities has proposed a novel object detection model—YOLOv13—which extends "correlation modeling" from binary to a true higher-order structure.The research team introduced a core component—HyperACE (Hypergraph-based Adaptive Correlation Enhancement). HyperACE treats pixels in multi-scale feature maps as vertices and adaptively explores higher-order correlations between vertices through learnable hyperedge building blocks.Subsequently, using an information transfer module with linear complexity, guided by high-order correlations, multi-scale features are effectively aggregated to achieve visual perception in complex scenes. Furthermore, HyperACE also integrates low-order correlation modeling to achieve more comprehensive visual perception.
Building upon HyperACE, YOLOv13 further proposed FullPAD (Full-Pipeline Aggregation-and-Distribution):The model first performs relevance enhancement at a global scale, and then distributes the enhanced features to various stages of the backbone, neck, and head, allowing "high-order semantics" to run through the entire detection process, improving gradient flow and enhancing overall performance. In addition, the authors replaced the traditional large convolutional kernel with a lighter depthwise separable convolutional module, reducing parameter and computational costs while maintaining accuracy.
The final results show that, from small models (N series) to large models,YOLOv13 has achieved significant improvements on MS COCO, reaching state-of-the-art detection performance with fewer parameters and FLOPs.Among them, YOLOv13-N has improved mAP by 3.01 TP3T compared to YOLOv11-N, and by 1.51 TP3T compared to YOLOv12-N.
Currently, the "One-Click Deployment of Yolov13" tutorial is available on the HyperAI website's "Tutorials" section. Click the link below to experience the one-click deployment tutorial ⬇️
Tutorial Link:
View related papers:
Demo Run
1. After entering the hyper.ai homepage, select "One-click deployment of Yolov13", or go to the "Tutorials" page and select "Run this tutorial online".



2. After the page redirects, click "Clone" in the upper right corner to clone the tutorial into your own container.
Note: You can switch languages in the upper right corner of the page. Currently, Chinese and English are available. This tutorial will show the steps in English.

3. Select the "NVIDIA GeForce RTX 5090" and "PyTorch" images, and choose "Pay As You Go" or "Daily Plan/Weekly Plan/Monthly Plan" as needed, then click "Continue job execution".


4. Wait for resource allocation. The first clone will take approximately 3 minutes. Once the status changes to "Running", click the jump arrow next to "API Address" to go to the Demo page.

Effect Demonstration
After entering the Demo running page, upload your image/video and click "Detect Objects" to run the demo.
Parameter Description:
* Models: yolov13n.pt (nano), yolov13s.pt (small), yolov13l.pt (large), yolov13x.pt (extra large). Larger models generally have higher accuracy (mAP), but also higher parameter count, computational cost (FLOPs), and longer inference time.
* Confidence Threshold: Confidence threshold.
* IoU Threshold: Intersection over Union (IoU) threshold, used for NMS.
* Max detections per image: The maximum number of detection boxes per image.
The editor used the "yolov13s.pt" model as an example for testing, and the results are shown below.

The above is the tutorial recommended by HyperAI this time. Everyone is welcome to come and experience it!
Tutorial Link: