Lyft Releases Largest L5 Autonomous Driving Prediction Dataset and Launches Motion Prediction Competition

6 years ago

Lyft recently released a Level 5 autonomous driving prediction dataset, which contains more than 1,000 hours of driving records. In addition, the company also launched an autonomous driving motion prediction challenge with a prize pool of US$30,000.

Lyft has released a new dataset.

Last July, Lyft released an L5 autonomous driving perception dataset, which contains more than 55,000 3D annotated frames marked by humans. At that time, it was officially called the largest public dataset of its kind.

Just one year later, Lyft released a set of L5 autonomous driving prediction data sets.

Application download address: https://www.catalyzex.com/paper/arxiv:2006.14480/dataset

170,000 scenes and more than 2,500 kilometers of road data

The dataset released by Lyft this time focuses on motion prediction.Officials said a long-standing research problem in the field of autonomous driving is to create models that are robust and reliable enough to predict traffic movements.

The data was collected over a four-month period by a fleet of 23 autonomous vehicles on a fixed route in Palo Alto, California.Contains driving logs of cars, pedestrians, and other obstacles encountered.

The dataset specifically includes:

1000 hours:More than 1,000 hours of autonomous vehicle movement records;
170,000 scenes:Each scene lasts about 25 seconds and includes traffic lights, aerial maps, sidewalks, etc.;
16,000 miles: 16,000 miles (2,575 kilometers) of data from public roads;
15242 annotated images:Includes a high-definition semantic map of the labeled elements and a high-definition bird's-eye view of the area.

**Example of a bird’s-eye view semantic map in the dataset**

This motion data is collected by a sensor array mounted on the roof of Lyft vehicles, which captures lidar, camera, and radar data as the vehicles travel tens of thousands of miles.

**In the dataset, each scene encodes the state of the vehicle’s surroundings at a given point in time.**,**Red represents self-driving cars, yellow represents other vehicles**

Lyft said the collection comes with the toolkit provided.This constitutes the largest, most complete and most detailed data set to date.Used for developing autonomous driving, machine learning tasks such as motion prediction, planning, and simulation.

Currently, only some subsets of the dataset are available for download, including:

Sample Dataset (53 MB)
Training dataset (divided into three parts, totaling 69.4 GB)
Bird's Eye View (2 GB)
Semantic Graph (2 MB)

Download address:

https://self-driving.lyft.com/level5/prediction/

Launch a challenge with a prize pool of 30,000 US dollars

at the same time,Lyft also plans to launch a challenge that will begin in August on the Google Kaggle platform and award a total of $30,000 in prizes.

**Last year, Lyft launched a self-driving 3D object detection competition with a total prize pool of $25,000.**

The highlights of this challenge:

Competition requirements:Contestants predict the movement of vehicles;
Preparation:Official reminder: researchers and engineers can download the training dataset and Python-based software packages from now on to experiment with the data, as the test and validation suites will be released as part of the competition;
Ultimate goal:Empowering the research community and accelerating innovation through datasets and competitions.

Lyft senior director of engineering Sacha Arnoud and director of audio and video research Peter Ondruska wrote in a blog post:“Data is the driving force behind trying the latest machine learning techniques.Access to large-scale, high-quality autonomous driving data is limited, but this should not prevent us from experimenting with this research.”

“We believe that autonomous vehicles will become a more convenient, safer and sustainable part of the transportation system,” Arnoud and Ondruska said.“By sharing data with the research community, we hope to identify important and unsolved challenges in autonomous driving."

ClickRead the original article, you can get more high-quality data sets!

Blog address:

https://medium.com/lyftlevel5/fueling-self-driving-research-with-level-5s-open-prediction-dataset-f0175e2b0cf8

Paper address:

https://arxiv.org/pdf/2006.14480.pdf

GitHub address:

https://github.com/lyft/l5kit/

-- over--

Lyft Releases Largest L5 Autonomous Driving Prediction Dataset and Launches Motion Prediction Competition

6 years ago

Big Factory News

Autonomous Driving

Lyft has released a new dataset.

Just one year later, Lyft released a set of L5 autonomous driving prediction data sets.

Application download address: https://www.catalyzex.com/paper/arxiv:2006.14480/dataset

170,000 scenes and more than 2,500 kilometers of road data

The dataset specifically includes:

1000 hours:More than 1,000 hours of autonomous vehicle movement records;
170,000 scenes:Each scene lasts about 25 seconds and includes traffic lights, aerial maps, sidewalks, etc.;
16,000 miles: 16,000 miles (2,575 kilometers) of data from public roads;
15242 annotated images:Includes a high-definition semantic map of the labeled elements and a high-definition bird's-eye view of the area.

This motion data is collected by a sensor array mounted on the roof of Lyft vehicles, which captures lidar, camera, and radar data as the vehicles travel tens of thousands of miles.

Currently, only some subsets of the dataset are available for download, including:

Sample Dataset (53 MB)
Training dataset (divided into three parts, totaling 69.4 GB)
Bird's Eye View (2 GB)
Semantic Graph (2 MB)

Download address:

https://self-driving.lyft.com/level5/prediction/

Launch a challenge with a prize pool of 30,000 US dollars

at the same time,Lyft also plans to launch a challenge that will begin in August on the Google Kaggle platform and award a total of $30,000 in prizes.

The highlights of this challenge:

Competition requirements:Contestants predict the movement of vehicles;
Preparation:Official reminder: researchers and engineers can download the training dataset and Python-based software packages from now on to experiment with the data, as the test and validation suites will be released as part of the competition;
Ultimate goal:Empowering the research community and accelerating innovation through datasets and competitions.

ClickRead the original article, you can get more high-quality data sets!

Blog address:

https://medium.com/lyftlevel5/fueling-self-driving-research-with-level-5s-open-prediction-dataset-f0175e2b0cf8

Paper address:

https://arxiv.org/pdf/2006.14480.pdf

GitHub address:

https://github.com/lyft/l5kit/

-- over--

Command Palette

Lyft Releases Largest L5 Autonomous Driving Prediction Dataset and Launches Motion Prediction Competition

170,000 scenes and more than 2,500 kilometers of road data

Launch a challenge with a prize pool of 30,000 US dollars

Command Palette

Lyft Releases Largest L5 Autonomous Driving Prediction Dataset and Launches Motion Prediction Competition

170,000 scenes and more than 2,500 kilometers of road data

Launch a challenge with a prize pool of 30,000 US dollars

Related News

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

Cornell University Has Developed EMSeek, a multi-agent Platform That Can Transform Electron Microscope Images Into Materials Science Insights in Just 2-5 minutes.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Paper Weekly Report | ProgramBench Enables AI to Write Software From Scratch, With 9 Major Models Failing En Masse; ExoActor Demonstrates Strong Scene Generalization Ability Without Additional real-world Data… A Quick Overview of the week's cutting-edge AI Papers

ICLR 2026 | 125x Reduction in Trainable Parameters Per Task! New Method Task Tokens Helps Embodied Intelligence Enhance Complex Task Capabilities

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Scientists Have Independently Generated Novel Materials by reverse-engineering gallium-containing Materials Using a Bayesian Optimization framework. The Optimization Results Exhibit Uniqueness and novelty.

Command Palette

Lyft Releases Largest L5 Autonomous Driving Prediction Dataset and Launches Motion Prediction Competition

170,000 scenes and more than 2,500 kilometers of road data

Launch a challenge with a prize pool of 30,000 US dollars

Related News

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

Cornell University Has Developed EMSeek, a multi-agent Platform That Can Transform Electron Microscope Images Into Materials Science Insights in Just 2-5 minutes.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Paper Weekly Report | ProgramBench Enables AI to Write Software From Scratch, With 9 Major Models Failing En Masse; ExoActor Demonstrates Strong Scene Generalization Ability Without Additional real-world Data… A Quick Overview of the week's cutting-edge AI Papers

ICLR 2026 | 125x Reduction in Trainable Parameters Per Task! New Method Task Tokens Helps Embodied Intelligence Enhance Complex Task Capabilities

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Scientists Have Independently Generated Novel Materials by reverse-engineering gallium-containing Materials Using a Bayesian Optimization framework. The Optimization Results Exhibit Uniqueness and novelty.

Related News

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

Cornell University Has Developed EMSeek, a multi-agent Platform That Can Transform Electron Microscope Images Into Materials Science Insights in Just 2-5 minutes.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Paper Weekly Report | ProgramBench Enables AI to Write Software From Scratch, With 9 Major Models Failing En Masse; ExoActor Demonstrates Strong Scene Generalization Ability Without Additional real-world Data… A Quick Overview of the week's cutting-edge AI Papers

ICLR 2026 | 125x Reduction in Trainable Parameters Per Task! New Method Task Tokens Helps Embodied Intelligence Enhance Complex Task Capabilities

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Scientists Have Independently Generated Novel Materials by reverse-engineering gallium-containing Materials Using a Bayesian Optimization framework. The Optimization Results Exhibit Uniqueness and novelty.

Related News

EnergAIzer, a GPU Power Estimation Framework Developed by MIT and Others, Completes Predictions in an Average of 1.8 Seconds With an Error of Approximately 81 TP3T.

CVEvolve, a Zero-code, self-discovery Scientific Image Processing Algorithm Proposed by Argonne National Laboratory, Possesses full-stack Capabilities Including Coding, Result Self-checking, and Strategy optimization.

Cornell University Has Developed EMSeek, a multi-agent Platform That Can Transform Electron Microscope Images Into Materials Science Insights in Just 2-5 minutes.

Paper Weekly Report | Microsoft MAI-Thinking Explores self-evolution of Pure RL, Achieving an AIME Accuracy of 97%; VLM³ Achieves 3D Task Generalization Using Plain Text Coordinates Without Architectural Modifications… A Quick Overview of the week's cutting-edge AI Papers

Paper Weekly Report | ProgramBench Enables AI to Write Software From Scratch, With 9 Major Models Failing En Masse; ExoActor Demonstrates Strong Scene Generalization Ability Without Additional real-world Data… A Quick Overview of the week's cutting-edge AI Papers

ICLR 2026 | 125x Reduction in Trainable Parameters Per Task! New Method Task Tokens Helps Embodied Intelligence Enhance Complex Task Capabilities

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Scientists Have Independently Generated Novel Materials by reverse-engineering gallium-containing Materials Using a Bayesian Optimization framework. The Optimization Results Exhibit Uniqueness and novelty.