Online Tutorial丨Important Innovation in the YOLO Series! Tsinghua Team Releases YOLOE, Which Directly Detects and Segment Objects in Open Scenes in Real Time

a year ago

YOLO (You Only Look Once) has become one of the most influential real-time object detection models in the field of computer vision since its first release in 2015. This end-to-end object detection technology based on a one-stage detection architecture has been updated more than ten versions in 10 years. With its real-time processing of high-precision and high-frame-rate images, it is widely used in many fields such as autonomous driving, medical image analysis, and robotic vision.

However, although the traditional YOLO series models use convolutional neural networks to achieve high-performance real-time detection,However, it relies on predefined target categories and lacks flexibility in practical open scenarios.

To address this problem, the Tsinghua University team, based on YOLO,The open object detection and segmentation model YOLOE is proposed, which supports three scenarios: text prompts, visual cues, and prompt-free.This multimodal capability enables it to understand language commands, see images, and even discover new things independently, truly achieving "seeing everything in real time."

Currently, the tutorial section of HyperAI's official website has launched a one-click deployment tutorial "YOLOE: See Everything in Real Time". Interested friends, come and experience it!

Tutorial Link:

https://go.hyper.ai/U2PXt

Click to view the super complete YOLO series tutorial: Online Tutorials | YOLO series has been updated with 11 versions in 10 years, and the latest model has reached SOTA in multiple target detection tasks

Demo Run

1. Log in to hyper.ai, on the Tutorials page, select YOLOE: See Everything in Real Time, and click Run this Tutorial Online.

2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select "NVIDIA RTX 4090" and "PyTorch" images. The OpenBayes platform has launched a new billing method. You can choose "pay as you go" or "daily/weekly/monthly" according to your needs. Click "Continue". New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!

HyperAI exclusive invitation link (copy and open in browser):

https://go.openbayes.com/9S6Dr

4. Wait for resources to be allocated. The first clone will take about 2 minutes. When the status changes to "Running", click the jump arrow next to "API Address" to jump to the Demo page. Due to the large model, it will take about 3 minutes to display the WebUI interface, otherwise "Bad Gateway" will be displayed. Please note that users must complete real-name authentication before using the API address access function.

Effect display

The first is text prompt detection,YOLOE supports text hint detection and segmentation for any text category. The text input in the figure below is "tiger, bus, person". The detection result is shown in the figure on the right, which clearly identifies the tiger, sightseeing bus and tourists in the picture. It can be seen that even tourists with their heads blocked and in the dark are clearly identified.

The second is visual cues.After specifying the detection target by means of boxes/points/hand-drawn shapes/reference images, similar detection objects can be accurately identified, as shown in the following figure:

Finally, there is fully automatic silent detection.It can automatically identify scene objects, as shown in the following figure:

The above is the tutorial recommended by HyperAI this time. Come and try it out for yourself!

Tutorial Link:

https://go.hyper.ai/U2PXt

Online Tutorial丨Important Innovation in the YOLO Series! Tsinghua Team Releases YOLOE, Which Directly Detects and Segment Objects in Open Scenes in Real Time

a year ago

Information

Artificial Intelligence

Deep Learning

Tsinghua University

Object Detection

Currently, the tutorial section of HyperAI's official website has launched a one-click deployment tutorial "YOLOE: See Everything in Real Time". Interested friends, come and experience it!

Tutorial Link:

https://go.hyper.ai/U2PXt

Demo Run

1. Log in to hyper.ai, on the Tutorials page, select YOLOE: See Everything in Real Time, and click Run this Tutorial Online.

2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

HyperAI exclusive invitation link (copy and open in browser):

https://go.openbayes.com/9S6Dr

Effect display

Finally, there is fully automatic silent detection.It can automatically identify scene objects, as shown in the following figure:

The above is the tutorial recommended by HyperAI this time. Come and try it out for yourself!

Tutorial Link:

https://go.hyper.ai/U2PXt

Command Palette

Online Tutorial丨Important Innovation in the YOLO Series! Tsinghua Team Releases YOLOE, Which Directly Detects and Segment Objects in Open Scenes in Real Time

Demo Run

Effect display

Command Palette

Online Tutorial丨Important Innovation in the YOLO Series! Tsinghua Team Releases YOLOE, Which Directly Detects and Segment Objects in Open Scenes in Real Time

Demo Run

Effect display

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Online Tutorial | HKU Team Open Sources DeepTutor, a Personal Learning Assistant That Enables Interactive Learning Covering Understanding, Reasoning, and Generation Through Multi-Agent Collaboration

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Command Palette

Online Tutorial丨Important Innovation in the YOLO Series! Tsinghua Team Releases YOLOE, Which Directly Detects and Segment Objects in Open Scenes in Real Time

Demo Run

Effect display

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Online Tutorial | HKU Team Open Sources DeepTutor, a Personal Learning Assistant That Enables Interactive Learning Covering Understanding, Reasoning, and Generation Through Multi-Agent Collaboration

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Online Tutorial | HKU Team Open Sources DeepTutor, a Personal Learning Assistant That Enables Interactive Learning Covering Understanding, Reasoning, and Generation Through Multi-Agent Collaboration

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Online Tutorial | HKU Team Open Sources DeepTutor, a Personal Learning Assistant That Enables Interactive Learning Covering Understanding, Reasoning, and Generation Through Multi-Agent Collaboration

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.