HyperAI超神经

In this blog post, we will explore the new features of Ultralytics’ new model, YOLOv8, take a deeper look at the architectural changes compared to YOLOv5, and demonstrate the new model by testing its Python API capabilities for detection on our basketball dataset.

Object detection remains one of the most popular and straightforward use cases for AI technology. Since the first version was released in the groundbreaking work of Joseph Redman et al. in 2016, "You Only Look Once: Unified, Real-Time Object Detection", the YOLO family of models has been leading the trend. These object detection models paved the way for research on using DL models to perform real-time entity subject and location recognition in images.

In this article, we will revisit the basics of these techniques, discuss the new features of Ultralytics' latest version, YOLOv8, and walk through the steps to fine-tune a custom YOLOv8 model using RoboFlow and Paperspace Gradient along with the new Ultralytics API. By the end of this tutorial, users should be able to quickly and easily fit a YOLOv8 model to any set of labeled images.

How does YOLO work?

source

First, let’s discuss the basics of how YOLO works. Here’s a short quote from the original YOLO paper breaking it down into the sum of the model’s capabilities:

“A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes. YOLO is trained on full images and directly optimizes for detection performance. This unified model has several advantages over traditional object detection methods.” (source）

As mentioned above, the model is able to predict the location of multiple entities in an image and identify its subject, provided it has been trained on these features. It does this in a single stage by dividing the image into N grids, each of size s * s. These regions are simultaneously parsed to detect and localize any objects contained within them. The model then predicts the bounding box coordinates B in each grid, and predicts a label and prediction score for the object contained within.

Ultralytics YOLOv5, Classification, Object Detection, Segmentation

Putting all of this together, we get a technique that is capable of performing object classification, object detection, and image segmentation tasks. Since the underlying technology of YOLO remains the same, we can infer that this will also work with YOLOv8. For a more complete breakdown of how YOLO works, be sure to check out our earlier posts on YOLOv5 and YOLOv7, our benchmarks with YOLOv6 and YOLOv7, and the original YOLO paper (Here).

What’s new in YOLOv8?

Since YOLOv8 was just released, there is no published paper covering the model yet. The authors intend to publish it soon, but for now, we can only follow the official release post, infer the changes from the commit history, and try to determine the extent of the changes between YOLOv5 and YOLOv8 by ourselves.

Architecture

Image source:RangeKing

according toOfficial Release, YOLOv8 uses a new backbone network, anchor-free detection head, and loss function. Github user RangeKing shared an overview of the YOLOv8 model infrastructure, showing the updated model backbone and head structure. By comparing this figure with YOLOv5 Compared with similar checks by RangeKing in theirPostsThe following changes were identified in:

C2f Module, Image source: RoboFlow (source)

They use C2f Module replaced C3 module. C2f In, from Bottleneck(two 3×3 with residual connections convs) are connected together, but in C3 Only the last one is used Bottleneck The output of (source)

The first Conv of each version. Image source:RangeKing

They are Backbone Use one 3x3 Conv The block replaces the first 6x6 Conv
They deleted two Conv(10th and 14th in YOLOv5 configuration)

Comparison of the two model backbones. Image source:RangeKing

They are Bottleneck Use one 3x3 Conv Replaced the first 1x1 Conv
They switched to decoupling headers and removed objectness Branches

Please check back once the YOLOv8 paper is released, we will update this section with more information. For a detailed analysis of the above changes, please check out the RoboFlow paper, which introduces YOLOv8 Release.

Accessibility

In addition to the old method of cloning the Github repository and manually setting up the environment, users can now use the new Ultralytics API to access YOLOv8 for training and inference. See below Training the model section for details on setting up the API.

No anchor bounding box

According to a blog post by Ultralytics partner RoboFlow YOLOv8, YOLOv8 now has anchor-free bounding boxes. In the initial version of YOLO, users were required to manually identify these anchor boxes in order to facilitate the object detection process. These predefined bounding boxes have a predetermined size and height, capturing the scale and aspect ratio of a specific object class in the dataset. The offset from these bounds to the predicted object helps the model better identify the location of the object.

In YOLOv8, these anchor boxes are automatically predicted at the center of the object.

Stop mosaic augmentation before training is finished

During each epoch of training, YOLOv8 sees a slightly different version of the image it is fed. These changes are called augmentations. One of them,Mosaic Enhancement, is the process of combining four images together, forcing the model to learn the identity of objects in new locations, where occlusion partially blocks each other, and the surrounding pixels have greater variation. It has been shown that using this augmentation throughout training can have a detrimental effect on prediction accuracy, so during the last few epochs of training, YOLOv8 can stop this process. This allows the best training model to run without scaling to the entire run.

Efficiency and accuracy

The main reason we are here is to improve performance accuracy and efficiency during inference and training. The authors of Ultralytics provided us with some useful sample data that we can use to compare the new version of YOLO with other versions. As you can see from the charts above, during training, YOLOv8 outperforms YOLOv7, YOLOv6-2.0, and YOLOv5-7.0 in terms of average accuracy, size, and latency.

On their respective Github pages, we can find a statistical comparison table of YOLOv8 models of different sizes. As can be seen in the table above, as the size of parameters, speed, and FLOPs increases, the mAP also increases. The largest YOLOv5 model, YOLOv5x, has a maximum mAP value of 50.7. A 2.2 unit increase in mAP value represents a significant improvement in capability. This is maintained across all model sizes, with the new YOLOv8 model consistently outperforming YOLOv5, as shown below.

Overall, we can see that YOLOv8 is a significant step forward from YOLOv5 and other competing frameworks.

Fine-tuning YOLOv8

To run the tutorial:Start running on OpenBayes

The process of fine-tuning a YOLOv8 model can be broken down into three steps: creating and labeling a dataset, training the model, and deploying the model. In this tutorial, we will go through the first two steps in detail and show how to use our new model on any incoming video file or stream.

Setting up the dataset

We will recreate the YOLOv7 experiment that we used to compare the two models, so we will return to the basketball dataset on Roboflow. Please check out the "Setting up a custom dataset" section of the previous article for detailed instructions on setting up the dataset, labeling it, and pulling it from RoboFlow into our notebook.

Since we are using a dataset that we made previously, now we just need to pull the data in. Below are the commands used to pull the data into the Notebook environment. For your own labeled dataset, use the same process but replace the workspace and project values with your own to access your dataset in the same way.

Please make sure to change the API key to your own if you want to follow along with the demonstration in the notebook using the script below.

!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="")
project = rf.workspace("james-skelton").project("ballhandler-basketball")
dataset = project.version(11).download("yolov8")
!mkdir datasets
!mv ballhandler-basketball-11/ datasets/

Training the model

Using the new Python API, we can use ultralytics The library does all the work in the Gradient Notebook environment. We will build from scratch using the provided configuration and weights YOLOv8n We will then use the dataset we just loaded into the environment to model.train() Fine-tune the method.

from ultralytics import YOLO

# 加载模型
model = YOLO("yolov8n.yaml")  # 从头构建新模型
model = YOLO("yolov8n.pt")  # 加载预训练模型（建议用于训练）

# 使用模型
results = model.train(data="datasets/ballhandler-basketball-11/data.yaml", epochs=10)  # 训练模型

Testing the Model

results = model.val()  # 在验证集上评估模型性能

We can use model.val() 方法 Set the new model to be evaluated on the validation set. This will output a nice table in the output window showing how well our model performed. Since we have only trained for ten epochs here, this relatively low mAP of 50-95 is to be expected.

From there, it's simple to submit any photo. It will output predictions for bounding boxes, overlay those boxes on the image, and upload them to the "runs/detect/predict" folder.

from ultralytics import YOLO
from PIL import Image
import cv2

# from PIL
im1 = Image.open("assets/samp.jpeg")
results = model.predict(source=im1, save=True)  # 保存绘制的图像
print(results)
display(Image.open('runs/detect/predict/image0.jpg'))

We get the predictions for bounding boxes and their labels as follows:

[Ultralytics YOLO <class 'ultralytics.yolo.engine.results.Boxes'> masks
type: <class 'torch.Tensor'>
shape: torch.Size([6, 6])
dtype: torch.float32
 + tensor([[3.42000e+02, 2.00000e+01, 6.17000e+02, 8.38000e+02, 5.46525e-01, 1.00000e+00],
        [1.18900e+03, 5.44000e+02, 1.32000e+03, 8.72000e+02, 5.41202e-01, 1.00000e+00],
        [6.84000e+02, 2.70000e+01, 1.04400e+03, 8.55000e+02, 5.14879e-01, 0.00000e+00],
        [3.59000e+02, 2.20000e+01, 6.16000e+02, 8.35000e+02, 4.31905e-01, 0.00000e+00],
        [7.16000e+02, 2.90000e+01, 1.04400e+03, 8.58000e+02, 2.85891e-01, 1.00000e+00],
        [3.88000e+02, 1.90000e+01, 6.06000e+02, 6.58000e+02, 2.53705e-01, 0.00000e+00]], device='cuda:0')]

Then apply them to the image, as shown in the following example:

Original image source

As we can see, our lightweight trained model shows that it can identify players on the field from players on the field and from players and spectators on the sidelines, with the exception of one corner. More training is almost certainly needed, but it is easy to see that the model acquires an understanding of the task very quickly.

If we are happy with the model training, we can then export the model to the desired format. In this case, we will export an ONNX version.

success = model.export(format="onnx")  # 将模型导出为 ONNX 格式

Summarize

In this tutorial, we explored the new features of Ultralytics’ powerful new model, YOLOv8, took a deep dive into the architectural changes compared to YOLOv5, and tested the new model’s Python API capabilities by testing it on our Ballhandler dataset. We were able to show that this represents a significant advance in simplifying the process of fine-tuning YOLO object detection models, and demonstrated the model’s ability to distinguish ball possessions in NBA games using photos from the game.