One-click Deployment of YOLOv12
🚀 YOLOv12: A versatile choice for computer vision tasks, with both speed and accuracy at its peak! 🚀

This tutorial uses RTX 4090 as the computing resource.
1. Tutorial Introduction 📖
YOLOv12 was launched in 2025 by researchers from the University of Buffalo and the University of the Chinese Academy of Sciences.YOLOv12: Attention-Centric Real-Time Object Detectors".
For a long time, enhancing the network architecture of the YOLO framework has been a core topic in the field of computer vision. Although the attention mechanism has performed well in modeling capabilities, CNN-based improvements are still the mainstream because attention-based models are difficult to match in speed. However, the launch of YOLOv12 has changed this situation! Not only is it comparable to CNN-based frameworks in speed, it also fully utilizes the performance advantages of the attention mechanism and becomes a new benchmark for real-time object detection.
YOLOv12’s Breakthrough Performance
- YOLOv12-N achieves a mAP of 40.6% with an inference latency of 1.64 milliseconds on a T4 GPU, which is 2.1%/1.2% higher than YOLOv10-N/YOLOv11-N.
- YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18, running 42% faster, using only 36% of computation, and reducing parameters by 45%.
📜 YOLO development history and related tutorials
YOLO (You Only Look Once) has been a leader in target detection and image segmentation since its launch in 2015. The following is the evolution of the YOLO series:
- YOLOv2 (2016): Introducing batch normalization, anchor boxes, and dimension clustering.
- YOLOv3 (2018): Using more efficient backbone networks, multi-anchors and spatial pyramid pooling.
- YOLOv4 (2020): Introducing Mosaic data augmentation, anchor-free detection head and new loss function. → Tutorial:DeepSOCIAL realizes crowd distance monitoring based on YOLOv4 and sort multi-target tracking
- YOLOv5: Added hyperparameter optimization, experiment tracking, and automatic export capabilities. → Tutorial:YOLOv5_deepsort real-time multi-target tracking model
- YOLOv6 (2022): Meituan open source, widely used in autonomous delivery robots.
- YOLOv7: Supports pose estimation for the COCO keypoint dataset.
- YOLOv8 (2023): Ultralytics is released, supporting a full range of visual AI tasks.
- YOLOv9: Introducing Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN).
- YOLOv10: Tsinghua University introduced an end-to-end header and eliminated the non-maximum suppression (NMS) requirement. → Tutorial:YOLOv10 real-time end-to-end object detection
- YOLOv11:Ultralytics latest model, supporting detection, segmentation, pose estimation, tracking and classification. → Tutorial:One-click deployment of YOLOv11
- YOLOv12 🚀 NEW: The dual peaks of speed and accuracy, combined with the performance advantages of the attention mechanism!
2. Operation steps🛠️
1. After starting the container, click the API address to enter the Web interface
The output of an object detector is a set of bounding boxes that enclose the objects in the image, along with a class label and a confidence score for each bounding box. Object detection is a good choice if you need to identify objects of interest in a scene but don’t need to know their exact location or shape.
It is divided into the following two functions:
- Image detection
- Video Detection
2. Image detection
The input is an image and the output is an image with labels.


Figure 1 Image detection
3. Video Detection
The input is a video and the output is a video with labels.

Figure 2 Video detection
🤝 Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

YOLOv12 is not only a technological leap, but also a revolution in the field of computer vision! Come and experience it! 🚀