Digital Event Horizon
The Launch of L2D: A Groundbreaking Multimodal Self-Driving Dataset Revolutionizing Robotics AI
A world-class multimodal self-driving dataset has been launched by the Yaak team, marking a significant milestone in the field of robotics AI. This dataset is poised to revolutionize the way we approach autonomous driving and provide researchers with an unparalleled resource for developing end-to-end learning models.
The L2D dataset is a world-class multimodal self-driving dataset launched by the Yaak team in collaboration with LeRobot. The dataset consists of over 1 PetaByte of data, making it one of the largest self-driving datasets available to date. The L2D dataset supports the development of end-to-end learning models for real-world robotics and autonomous vehicles. The dataset comprises 60 electric vehicles with identical sensor suites and was collected using various sensors, including cameras, GPS, IMU, and CAN bus. The dataset includes a range of route restrictions, features, and maneuvers to simulate real-world driving scenarios. The L2D dataset will be released in several stages, starting from R0 with 100 episodes to R4 with over 1 million episodes.
L2D, a world-class multimodal self-driving dataset, has been launched by the Yaak team in collaboration with LeRobot. This monumental achievement marks a significant milestone in the field of robotics AI and is poised to revolutionize the way we approach autonomous driving.
The L2D dataset is a culmination of a three-year effort by the Yaak team, who worked tirelessly to collect and curate a vast repository of high-quality, multimodal data. The dataset consists of over 1 PetaByte of data, making it one of the largest self-driving datasets available to date. This monumental achievement is a testament to the dedication and expertise of the Yaak team.
The L2D dataset is designed to support the development of end-to-end learning models for real-world robotics, such as those used in autonomous vehicles. Unlike existing self-driving datasets, which focus on intermediate perception and planning tasks, L2D takes a more comprehensive approach by leveraging internet pre-trained VLM and VLAM models.
The dataset comprises 60 electric vehicles (EVs) equipped with identical sensor suites, which were operated by driving schools in 30 German cities over the course of three years. The data was collected using various sensors, including cameras, GPS, IMU, and CAN bus, providing a rich and diverse set of multimodal data.
The L2D dataset is divided into two policy groups: expert policies executed by driving instructors and student policies implemented by learner drivers. Both groups include natural language instructions for the driving task, making it an ideal platform for testing and evaluating self-driving models.
One of the most significant features of the L2D dataset is its multimodal search capability. The Yaak team has developed a LLM-powered multimodal natural language search, which enables users to search within all the drive data (> 1 PetaBytes) and retrieve matching episodes based on specific instructions or route tasks.
The dataset also includes a range of route restrictions, features, and maneuvers, which are designed to simulate real-world driving scenarios. These include route tags from OSM, which impose restrictions on the policy, as well as physical structures along the route, such as inclines, tunnels, and pedestrian crossings.
To ensure the highest quality episodes within the training set, the Yaak team plans a phased release for L2D. With each new release, additional information about the episodes will be added to ensure clean episode history. This approach will provide researchers with a more comprehensive understanding of the dataset and enable them to build more accurate self-driving models.
The launch of L2D has sent shockwaves throughout the robotics AI community, with many experts hailing it as a major breakthrough in the field. The Yaak team's dedication to creating a high-quality, multimodal self-driving dataset is a testament to their expertise and commitment to advancing the state-of-the-art in robotics AI.
The L2D dataset will be released in several stages, starting from R0, which includes 100 episodes, to R4, which comprises over 1 million episodes. The entire dataset is expected to reach sizes of up to 10 PetaBytes, making it an ideal platform for researchers and developers seeking to push the boundaries of self-driving models.
In conclusion, the launch of L2D represents a significant milestone in the development of robotics AI. Its comprehensive multimodal approach, rich dataset, and robust search capabilities make it an unparalleled resource for researchers and developers seeking to tackle the complex challenges of autonomous driving.
Related Information:
https://www.digitaleventhorizon.com/articles/The-Launch-of-L2D-A-Groundbreaking-Multimodal-Self-Driving-Dataset-Revolutionizing-Robotics-AI-deh.shtml
https://huggingface.co/blog/lerobot-goes-to-driving-school
Published: Tue Mar 11 04:43:40 2025 by llama3.2 3B Q4_K_M