Introduction
Quality datasets drive autonomous driving forward.
KITTI launched the 3D object detection era. nuScenes brought multi-modal perception to the forefront. Waymo Open Dataset delivered real-world data at scale. Cityscapes defined urban scene understanding. Each dataset marked a turning point in how we approach autonomous driving.
But the field has outgrown these foundations. Most datasets focus on passenger cars cruising highways, leaving significant blind spots. Commercial vehicles operate differently. New sensors like 4D radar need data. Bad weather breaks current systems. End-to-end driving demands vision-language-action data working together.
Between 2024 and 2025, researchers worldwide released specialized datasets targeting these exact problems. They expanded sensor coverage, tackled new scenarios, refined annotations, and redefined tasks.
In this article, we'll explore 15 newly released AI datasets and see how they complement and extend the existing autonomous vehicle data ecosystem.

RoboSense: Multi-Sensor Dataset for Low-Speed Autonomous Driving
Keywords: Low-speed autonomous driving, Near-field perception, Multi-modal fusion
Shanghai Jiao Tong University and SenseAuto Research built RoboSense in 2024 for an overlooked domain: low-speed autonomous vehicles like delivery robots and street sweepers. These vehicles face unique challenges—pedestrians walk inches away, obstacles appear suddenly, and near-field perception becomes critical.

The team equipped electric sweepers with comprehensive sensor arrays: 4 standard cameras, 4 fisheye cameras, 4 LiDARs, and more, capturing full 360-degree coverage. The resulting dataset contains 133K+ synchronized frames with 1.4M 3D cuboids (3D bounding boxes) and 216K trajectories across 6 scenario types.
The dataset proposes a "Closest-Collision Distance Proportion (CCDP)" metric and establishes a comprehensive benchmark covering 6 tasks including perception, object tracking, and prediction, establishing the first professional evaluation standard for L4 unmanned functional vehicles.
Paper: RoboSense: Large-scale Dataset and Benchmark for Multi-sensor Low-speed Autonomous Driving
MAN TruckScenes: Multi-Modal Dataset for Autonomous Trucks
Keywords: Autonomous trucks, 4D radar, Long-range perception
Released jointly by Technical University of Munich and MAN Truck & Bus SE in 2024, MAN TruckScenes is the first large-scale multi-modal dataset specifically designed for autonomous trucks.
Existing autonomous driving datasets target passenger vehicles and cannot address unique truck challenges such as trailer occlusion, dynamic vehicle combinations, and logistics terminal environments. Heavy vehicles require different sensor mounting solutions and must maintain functionality under various harsh conditions.
Addressing this challenge, MAN TruckScenes contains 747 scenes equipped with 4 cameras, 6 LiDARs, 6 radar sensors, and other sensors providing complete 360-degree coverage. The dataset provides 360-degree 4D radar data for the first time, becoming the largest annotated 3D bounding box radar dataset.
Coverage spans three seasons and multiple weather conditions, with data annotations extending beyond 230 meters—crucial for heavy vehicles that need more stopping distance. The 27 object categories include truck-specific challenges rarely seen in passenger car datasets.
Paper: MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions
Para-Lane: Cross-Lane Novel View Synthesis Dataset
Keywords: Multi-lane scenarios, End-to-end autonomous driving
Released in 2025, Para-Lane is the first real-world multi-lane dataset specifically designed to evaluate novel view synthesis (NVS) capabilities across lane scenarios.
This dataset provides 25 linked sequences containing 16,000 front views, 64,000 surround views, and 16,000 LiDAR frames from real-world multi-lane driving.

The team developed a two-stage pose optimization approach. First, they build consistent maps using LiDAR data. Then they register camera frames to these maps, achieving precise alignment across different sensor types.
This matters because current NVS methods optimize for interpolation between similar viewpoints, not the lateral shifts needed for lane changes. Para-Lane fills a critical gap in parallel path planning validation for autonomous driving simulation.
The public release of this dataset is expected to accelerate research and commercialization of end-to-end autonomous driving systems.
Paper: Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis
UniOcc: Unified Occupancy Forecasting and Prediction Benchmark
Keywords: Occupancy prediction, Voxel flow, Cross-domain generalization
Occupancy grids represent space differently than bounding boxes, showing what's occupied and what's free, crucial for navigation. UniOcc, released in 2025, combines occupancy perception with forecasting in a single benchmark.
The dataset merges real-world data from nuScenes and Waymo with simulated environments from CARLA and OpenCOOD, totaling 14.2 hours across 2,152 sequences. Each voxel includes forward and backward voxel-level flow annotations, enabling motion prediction at the occupancy level.
UniOcc introduces evaluation metrics that work without ground truth and supports cross-dataset training. For the first time, researchers can study cooperative occupancy prediction where multiple agents share perception. Tests show that flow information dramatically improves prediction accuracy, while multi-domain training enhances generalization.
Paper: UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Adver-City: Adverse Weather Collaborative Perception Dataset
Keywords: Adverse weather, Collaborative perception, Strong light simulation
Adverse weather poses significant challenges to autonomous vehicles, with accident risk in rainy conditions 70% higher than normal weather. Adver-City confronts this disconnect with deliberate adverse conditions for collaborative perception testing.

Based on accident report analysis, the open-source dataset recreates the most dangerous road configurations. Built in CARLA with OpenCDA, it includes 110 scenes generating 24K frames with 890K annotations. Six weather conditions challenge algorithms: clear, soft rain, heavy rain, fog, foggy rain, and strong light—the first dataset to simulate blinding glare.
Benchmark testing shows the performance of top models like CoBEVT significantly degrades in adverse weather, validating the dataset's challenging nature.
AIDOVECL: The First AI-Generated Vehicle Outpainting Dataset
Keywords: Generative AI, Outpainting technology, Self-annotation
Traditional vehicle datasets like Stanford Cars, MIO-TCD, and COCO have clear limitations: insufficient eye-level views, overly broad vehicle classifications, and unlabeled background vehicles. These limitations cripple urban planning and environmental monitoring applications that need precise vehicle understanding.
University of Illinois Urbana-Champaign took an unconventional approach in 2024: let AI generate the data. AIDOVECL starts with 15,000 manually selected seed images. The system detects and crops vehicles, then uses diffusion models to recolor, scale, and place them on larger canvases through outpainting.
The "outpainting as self-annotation" paradigm means every generated image comes with precise bounding boxes for 9 vehicle categories. Quality control uses BRISQUE and CLIP-IQA metrics to ensure visual fidelity. The dataset fills the eye-level vehicle detection gap through synthetic generation.
Paper: AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization
DriveLMM-o1: Step-by-Step Reasoning Dataset for Autonomous Driving
Keywords: Step-by-step reasoning, Multi-modal understanding, Driving scene analysis
DriveLMM-o1 is the first comprehensive dataset and benchmark specifically designed for step-by-step reasoning in autonomous driving scenarios. The dataset addresses the critical lack of structured reasoning processes in autonomous driving visual question answering.

The dataset includes over 18k training samples and 4k test samples, covering three core tasks: perception, prediction, and planning. Built on NuScenes, DriveLMM-o1 integrates multi-view images and LiDAR point clouds, providing complete step-by-step reasoning annotations for each question.
The evaluation goes beyond accuracy to measure reasoning quality, risk assessment, and traffic rule compliance. InternVL2.5 models fine-tuned on this data show 7.49% better answer accuracy and significantly improved reasoning scores, demonstrating the value of explicit reasoning supervision.
TLD: Vehicle Taillight Signal Dataset
Keywords: Taillight detection, Driving intent recognition, Vehicle signals
Turn signals save lives by communicating intent like turning and lane changes in advance, crucial for collision prevention and safe driving.
Released in 2024, TLD is the first large-scale vehicle taillight signal detection dataset, filling this important gap for autonomous driving systems.
The dataset contains 152,690 annotated frames and 307,509 instances from 17.78 hours of driving video. Sources include 21 high-quality YouTube videos and the LOKI dataset, covering day/night, weather variations, and urban/highway/rural settings.
The benchmark uses YOLOv10 with DeepSORT tracking, adding temporal post-processing to handle blinking patterns. By understanding taillight signals, autonomous vehicles can better predict other drivers' intentions and prevent accidents.
Paper: TLD: A Vehicle Tail Light signal Dataset and Benchmark
CoVLA: Comprehensive Vision-Language-Action Self-Driving Dataset
Keywords: Vision-Language-Action, End-to-end driving, Trajectory prediction
Existing autonomous driving datasets primarily focus on perception and high-level driving commands, lacking comprehensive annotations that combine visual understanding, language descriptions, and precise action planning.

Released by Turing Inc. in 2024, CoVLA bridges this gap as the first large-scale comprehensive Vision-Language-Action dataset specifically designed for autonomous driving.
CoVLA builds on over 1,000 hours of Tokyo driving data, selecting 10,000 30-second scenes covering over 80 hours of diverse driving environments. The dataset integrates multi-sensor information from forward cameras, CAN bus, GNSS, and IMU, providing detailed natural language descriptions and 3-second future trajectory annotations for each frame.
Paper: CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
OpenAD: Open-World Autonomous Driving 3D Object Detection Benchmark
Keywords: Open-world perception, Edge case detection, Domain generalization
Self-driving cars must handle the unexpected, such as construction zones, fallen cargo, unusual vehicles. OpenAD creates the first benchmark for these open-world scenarios.
Built on an edge case discovery and annotation pipeline, OpenAD integrates multi-modal large language models (MLLM) for automatic scene identification. It selects 2,000 scenes from five major datasets—Argoverse 2, KITTI, nuScenes, ONCE, and Waymo—annotating 6,597 edge case objects and 13,164 common objects across 206 categories.
The proposed framework converts 2D open-world object detection capabilities into 3D, combining general model flexibility with specialized model precision. OpenAD provides the tools to evaluate how well systems handle the unpredictable reality of driving.
Paper: OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
V2X-Radar: 4D Radar Cooperative Perception Dataset
Keywords: 4D radar, V2X cooperative perception, Multi-modal perception
Released by Tsinghua University and other institutions in 2024, V2X-Radar is the world's first large-scale vehicle-to-infrastructure cooperative perception dataset containing 4D radar. The dataset fills the gap in 4D radar applications for cooperative computer vision research, providing crucial data support for all-weather autonomous driving.

Traditional collaborative perception datasets rely on cameras and LiDAR, with limited performance in adverse weather, while 4D radar maintains stable perception in extreme weather conditions. The V2X-Radar dataset includes 20K LiDAR frames, 40K camera images, and 20K 4D radar frames, covering 350K annotated boxes. Collection scenarios include various weather conditions and complex intersections.
The dataset provides valuable resources for related research, advancing multi-modal fusion and all-weather autonomous driving.
Paper: V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception
V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection
Keywords: V2X cooperative perception, 4D radar fusion, Denoising diffusion
While V2X-Radar provides real-world data, V2X-R from Xiamen University offers controlled simulation environments for LiDAR-camera-4D radar fusion. Released in 2024, the dataset includes 12,079 scenes, 37,727 point cloud frames, and 150,908 images, with special emphasis on adverse weather simulation including fog and snow.
The research team developed Multi-modal Denoising Diffusion (MDD) modules that use weather-robust 4D radar features to clean noisy LiDAR data. This approach leverages each sensor's strengths—LiDAR's precision in clear conditions and radar's reliability in adverse weather.
The combination of simulation control and multi-modal fusion creates a powerful platform for developing weather-resistant computer vision algorithms.
Paper: V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion
WayveScenes101: Autonomous Driving Novel View Synthesis Dataset
Keywords: Novel view synthesis, Autonomous driving, Off-axis evaluation
Existing novel view synthesis datasets primarily focus on static or object-centric scenes, failing to capture the challenges of driving scenarios, like moving vehicles, changing lighting, lens flares, and limited camera positions.
Recognizing this gap, Wayve released WayveScenes101 in 2024, a large-scale dataset specifically designed for novel view synthesis tasks in self-driving scenarios.

WayveScenes101 provides 101 20-second driving sequences totaling 101,000 images from diverse US and UK locations. COLMAP poses and detailed metadata (weather, time, traffic) enable targeted analysis of synthesis quality under specific conditions.
The dataset specifically evaluates off-axis view generation—critical for understanding scenes from different perspectives during lane changes or emergency maneuvers.
Paper: WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving
DurLAR: High-Fidelity 128-Channel LiDAR Multi-Modal Dataset
Keywords: 128-channel LiDAR, Panoramic images, Monocular depth estimation
Most autonomous driving LiDARs use 16-64 channels. Durham University pushed to 128 channels in 2024, capturing unprecedented detail in their DurLAR dataset.
The high-resolution sensor produces 2048×128 panoramic images in both ambient (near-infrared) and reflectance modes. Multi-beam flash technology eliminates rolling shutter artifacts. The team collected 100,000 frames on repeated routes under varying conditions, creating rich temporal diversity.

For depth estimation tasks, they developed joint supervised/self-supervised loss functions that exploit the dense LiDAR data. The resolution boost enables new research directions in fine-grained scene understanding.
SEVD: Synthetic Event-based Vision Dataset
Keywords: Event cameras, Synthetic data, Multi-view perception
Released by Arizona State University in 2024, SEVD is the first synthetic event camera vision dataset providing both ego-vehicle and fixed perception.
Event cameras offer high dynamic range, high temporal resolution, and low power consumption advantages, but existing synthetic event datasets are extremely scarce. This dataset fills this important gap.

Built in CARLA, SEVD provides both ego-vehicle perception (6 views for 360° coverage) and fixed infrastructure viewpoints (4 strategic positions). It covers diverse environmental conditions including three lighting environments (day, night, twilight) and five weather conditions across urban, suburban, rural, and highway scenarios.
The dataset is massive, containing 58 hours of multi-modal data and 9 million bounding box annotations, providing RGB, depth, optical flow, and semantic segmentation information alongside event streams. The synthetic approach allows perfect ground truth and controlled experimentation with this emerging sensor technology.
Paper: SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
Bonus: Autonomous Driving Datasets Survey 2024
To conclude this blog post, we recommend "A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook" for comprehensive dataset analysis.
Published in 2024, this survey is the most comprehensive investigation of autonomous driving datasets to date, reviewing 265 datasets covering all core tasks including perception, prediction, planning control, and end-to-end driving. It also introduces a dataset impact score metric for the first time, combining citations, data dimensions, and environmental diversity.

The authors identify the top 50 influential datasets and analyze them across sensor modalities, perception domains, geographic distribution, and environmental conditions. They particularly examine annotation quality, revealing issues even in well-known datasets through detailed case studies.
Summary
These datasets reveal autonomous driving's evolution beyond basic perception tasks like 3D object detection and semantic segmentation. The frequent "first" labels signal an innovation explosion as researchers identify and address specific gaps in our data ecosystem. These datasets achieve breakthroughs not only in sensor modalities but also open entirely new tracks in task definitions.
Dataset creation itself has also evolved. Beyond traditional data collection, teams now use AI generation, sophisticated simulation, and multi-domain fusion. These methods provide diverse data sources and evaluation approaches previously impossible.
However, the field's hunger for data has reached new heights as models grow larger and tasks more complex. Public datasets, while valuable, can't meet every need. Quality data annotation remains challenging and time-consuming.
If your team is seeking for data annotation service providers, consider talking with BasicAI, an industry-leading provider that has prepared high-quality datasets for Fortune 500 companies and leading AI teams worldwide.





