Contents

3D LiDAR technology is witnessing a paradigm shift from basic perception to sophisticated understanding. High-precision point cloud analysis is now within reach thanks to better sensors, innovative neural network architectures, and dramatically improved computing power at the edge.
But there's a problem. While image datasets are everywhere, quality point cloud data remains hard to find. This gap is particularly noticeable when you venture beyond autonomous vehicles into fields like robotics, industrial quality control, precision agriculture, and urban planning... In these areas, finding the right data often becomes the first major hurdle.
We've put together this resource list of 30 diverse point cloud datasets to help bridge that gap. Each includes practical details about collection methods, annotation specifics, and real-world applications. Whether you're a researcher testing new algorithms or a developer building practical applications, these datasets provide a solid foundation to build upon.
Autonomous Driving, ADAS and Transportation LiDAR Point Cloud Datasets

1. Toronto-3D: A Large-scale Mobile LiDAR Dataset
This high-resolution point cloud dataset supports urban modeling and self-driving research. Covering approximately 1km of downtown Toronto roads, it contains roughly 78.3 million points with 10 attribute channels including XYZ coordinates, RGB color, intensity, GPS time, and scan angle. Toronto-3D features annotations for 8 object classes (roads, buildings, vehicles, etc.) plus unclassified points (label 0), making it ideal for deep learning and point cloud processing algorithm development for applications in autonomous driving, smart city initiatives, and 3D mapping.
Original Paper: Link
Published by: Mobile Sensing and Geodata Science Lab, University of Waterloo (2020)
License: CC BY-NC 4.0
2. nuScenes-lidarseg Dataset
A comprehensive multimodal dataset for autonomous driving research featuring rich LiDAR data captured at 20 frames per second. The dataset encompasses complex urban scenarios with vehicles, pedestrians, and road infrastructure. It provides 3D bounding boxes and point cloud segmentation labels with support for panoptic segmentation, distinguishing between instance-level and semantic-level categories. This resource has become fundamental for autonomous driving perception tasks including object detection, semantic segmentation, and panoptic segmentation research.
Original Paper: Link
Published by: Motional (2020)
License: Non-commercial Purposes
3. H3D: A Full-surround 3D Multi-object Detection and Tracking Dataset
Built upon data collected in the San Francisco Bay Area, H3D features 160 crowded, highly interactive traffic scenes. The dataset provides full 360-degree point cloud data from Velodyne-64 LiDAR sensors and includes over 1.07 million manually annotated 3D bounding boxes across 8 common traffic participant classes. Supporting data includes CAN vehicle data, GPS+IMU information, 3D annotation details, and panoramic point clouds, establishing a benchmark for SOTA 3D object detection and tracking algorithms.
Original Paper: Link
Published by: Honda Research Institute (2019)
License: non-commercial usage
4. A2D2: Audi Autonomous Driving Dataset
Released by Audi, this multimodal dataset includes 41,280 frames with semantic segmentation and tagging, with 12,499 frames providing additional 3D cuboid (3D bounding box) annotations. It also contains approximately 390,000 frames of unannotated sensor data collected across three cities, including multiple drive cycles. The 3D cuboids cover 14 driving-relevant categories such as cars, pedestrians, and buses, focusing on LiDAR points within the front camera field of view.
Original Paper: Link
Published by: Audi (2020)
License: CC BY-ND 4.0
5. SOTIF PCOD
This dataset focuses on Safety Of The Intended Functionality (SOTIF) for autonomous systems, containing 547 frames of 3D LiDAR point cloud data simulating multi-lane highway scenarios. The data spans 21 different weather conditions (clear, cloudy, wet ground, and rainfall) across noon, sunset, and nighttime periods. With 3D object detection annotations stored in .obj and .txt formats, it's well-suited for 3D object detection, sensor performance testing, and evaluating deep learning model reliability across diverse environmental conditions.
Published by: Milin Patel (2024)
License: CC BY 4.0
6. Oxford RobotCar Dataset: Open Place Recognition
A preprocessed subset of the Oxford RobotCar dataset specifically optimized for place recognition tasks. The original dataset recorded over 1,000 km of urban driving data along a fixed route through Oxford city center, capturing data weekly. It includes information from six vehicle-mounted cameras, LiDAR, and inertial navigation systems across various weather conditions including heavy rain, night, direct sunlight, and snow. This version preserves the multimodal sensor data while incorporating specific preprocessing to better support visual place recognition algorithm development and testing.
Original Paper: Link
Published by: Oxford Robotics Institute, University of Oxford (2020)
License: CC BY-NC-SA 4.0
7. Cirrus dataset: A Curated Dataset with Unique Long-range LiDAR Point Clouds
This high-quality dataset includes 6,285 paired RGB images and LiDAR point cloud frames collected with Luminar Hydra LiDAR, offering an effective detection range of 250 meters and supporting scan frequencies up to 10Hz. Covering both highway and low-speed urban road scenarios, Cirrus provides 3D annotation files for 8 object classes (vehicles, large vehicles, pedestrians, bicycles, animals, wheeled pedestrians, trailers, and motorcycles), supporting object detection, classification, and semantic segmentation tasks for autonomous driving.
Original Paper: Link
Published by: Volvo Cars (2020)
License: CC BY-SA 4.0
Precision Agriculture & Agritech LiDAR Point Cloud Datasets

8. GREENBOT:Mobile Robot Dataset in Mediterranean Greenhouse Environments
Designed specifically for mobile robots operating in Mediterranean greenhouses, GREENBOT contains 9 sequences recorded by a mobile platform equipped with multiple sensors. Data includes 3D LiDAR point clouds, stereo camera images and IMU readings collected under various weather and lighting conditions throughout different plant growth stages. GREENBOT is particularly valuable for developing and testing SLAM (Simultaneous Localization and Mapping) algorithms in agricultural automation, addressing the unique visual and localization challenges of greenhouse interiors, especially for robotic spraying and crop monitoring in precision agriculture.
Original Paper: Link
Published by: University of Almeria (UAL) (2024)
License: CC BY-NC-SA 3.0
9. HOPS:Hierarchical Orchard Panoptic Segmentation Dataset
The HOPS dataet features 3D point cloud data collected from real apple orchards using various sensors including ground-based laser scanners and RGB-D cameras mounted on drones. Created specifically for hierarchical panoptic segmentation tasks in orchards, it records apple orchard data across different growth stages over two years. The dataset provides high-quality annotations including semantic segmentation, tree instance segmentation, and fruit and trunk instance segmentation, supporting automated monitoring and intervention in orchard environments, particularly for fruit counting and growth analysis.
Original Paper: Link
Published by: Photogrammetry & Robotics Lab, University of Bonn (2025)
10. AgriField3D: A Curated 3D Point Cloud Dataset of Field-Grown Plants
This specialized 3D point cloud dataset focuses on maize plants, containing over 1,000 high-quality 3D point clouds of field-grown maize varieties captured via ground-based LiDAR. The dataset includes semantically and instance-segmented plant models with consistent color coding for leaves and stems based on their position within the plant. Multiple resolution versions are provided to accommodate different computational requirements, along with detailed plant morphology and quality metadata, supporting advanced agricultural research in maize phenotyping and plant structure analysis.
Original Paper: Link
Published by: Iowa State University (2025)
License: CC-BY-NC-4.0
11. Avocado Tree Point Clouds with Class Labels
Containing point cloud data from 24 avocado trees captured using handheld LiDAR scanning, this dataset has been calibrated with high-precision GPS and cropped to retain only the central three trees. Each point cloud file records 3D coordinates, material class (leaf or trunk), tree category (ground, center tree, north tree, south tree, or unclassified), and height from ground, all manually annotated. The binary format data is compatible with open-source tools like ACFR Snark and supports applications in orchard management, point cloud segmentation and classification, 3D reconstruction, agricultural robot navigation, and ecological research.
Published by: University of Sydney (2021)
License: CC BY 4.0
Robotics & Navigation LiDAR Point Cloud Datasets

12. VBR: A Vision Benchmark in Rome
The VBR dataset supports the development and evaluation of SLAM algorithms for robot navigation and autonomous driving with 6 datasets collected using multimodal sensors including 3D LiDAR and RGB cameras. It provides high-resolution point clouds, images, and inertial measurement data covering multiple characteristic areas of Rome with a combined trajectory exceeding 40 km. The precisely annotated data includes pose, trajectory, and environmental feature information for map building, autonomous robot localization, and path planning. The diverse and large-scale data sequences address challenges in environmental diversity, motion patterns, and sensor frequencies.
Original Paper: Link
Published by: Sapienza University of Rome (2024)
License: CC BY-SA 4.0
13. GND: Global Navigation Dataset
This multimodal dataset is designed for large-scale outdoor navigation, covering approximately 2.7 square kilometers across 10 university campuses. Collected by a manually operated Jackal robot, it integrates 3D LiDAR point clouds (from Velodyne VLP-16 and Ouster OS1-32), RGB images, and 360° panoramic images, along with multi-category traversability map annotations. GND supports map-based global navigation, mapless navigation, and global place recognition applications for various robot types including wheeled and legged platforms.
Original Paper: Link
Published by: University of Maryland, George Mason University (2024)
License: CC0 1.0
14. Grasp-Anything-6D
This large-scale language-driven dataset focuses on 6-degree-of-freedom (6DoF) grasp detection for robotic manipulation. Supporting natural language instruction-based precise 6DoF grasp pose prediction, the dataset structure includes point cloud scene data (8192-point .npy files per scene), 3D object segmentations, 6DoF grasp poses, and grasp instruction prompts. Grasp-Anything-6D is applicable to robotic vision and grasp detection research, particularly for tasks requiring precise 6DoF pose control, enabling complex object manipulation through natural language instructions.
Original Paper: Link
Published by: Airvlab (2024)
License: MIT
15. L-CAS 3D Point Cloud People Dataset
Comprising 28,002 frames of Velodyne VLP-16 3D LiDAR scan data, this dataset includes both stationary and mobile robot scenarios. It features 5,492 annotated frames with category information (pedestrian or group), centroid coordinates, bounding boxes, and visibility status. Individual person labels contain between 3 and 3,925 3D points at distances ranging from 0.5 to 27 meters. The dataset's strength lies in its complex scenarios including crowds, children, people carrying luggage, pushing carts, and seated individuals, supporting research in person detection, tracking, and classification for robot vision applications.
Original Paper: Link
Published by: The University of Lincoln's Cross-Disciplinary Centre in Robotics Research (2017)
License: CC BY-NC-SA 4.0
16. Treescope: An Agricultural Robotics Dataset
The TreeScope is the first semantic segmentation LiDAR dataset collected by robotic systems in agricultural environments, focusing on tree counting and mapping in forestry and orchards. Covering both forestry and orchard environments, it provides various sensor data including 3D LiDAR point clouds. Ground truth annotations include manual semantic labels for tree trunks and field-measured tree diameter data. Additionally, it provides accumulated point clouds of individual trees and baseline diameter estimation results, supporting agricultural robotics applications such as tree detection, semantic segmentation, and diameter estimation.
Original Paper: Link
Published by: KumarRobotics (2023)
License: CC BY-NC-SA 4.0
17. VLA-3D: 3D Semantic Scene Understanding and Navigation
This large-scale dataset for 3D semantic scene understanding and navigation supports advanced tasks like Visual Language Navigation (VLN). Combining six 3D scan data sources, it contains 7,635 3D scenes with 11,619 areas and over 9.6 million synthesized natural language descriptions. Predominantly featuring real-world scenes, each contains between 4 and 2,264 objects with attributes including 3D point clouds, semantic labels, bounding boxes, and primary colors, making it suitable for indoor robot navigation systems, object recognition, and complex spatial reasoning.
Original Paper: Link
Published by: Robotics Institute, Carnegie Mellon University (2024)
License: MIT
Smart City & Urban Infrastructure LiDAR Point Cloud Datasets

18. Parking Lot Locations and Utilization Samples in the Hannover Linden-Nord Area
Collected using a mobile LiDAR system, this urban road parking dataset contains public roadside parking location information and occupancy data for northeast Hannover Linden-Nord. It includes three main components: vehicle detection data (vehicle bounding boxes identified from segmented 3D point clouds), parking area data (manually digitized using aerial imagery and detected vehicles), and parking occupancy data (sampling center points every 5 meters and determining occupancy by intersection with vehicle bounding boxes). The dataset is particularly valuable for smart city parking management system development, urban planning, and dynamic space management applications.
Published by: Institut für Kartographie und Geoinformatik (2024)
License: CC BY-NC 3.0
19. PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark
As the first point cloud semantic completion benchmark dataset for vehicle-infrastructure cooperative scenarios, PointSSC focuses on long-range perception and minimal occlusion environments. Containing point cloud data from both vehicle-mounted and roadside LiDAR sensors, it generates complete scenes and semantic labels through time-series point cloud stitching and automatic annotation. The dataset offers two evaluation approaches (time-based and scene-based splitting) across multiple urban road scenarios to test generalization capabilities in different contexts. It's especially suitable for scene understanding, traffic flow analysis, and safety monitoring in smart city applications involving vehicle-infrastructure cooperation.
Original Paper: Link
Published by: Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University (2023)
License: MIT
20. WHU-Railway3D: Railway Point Cloud Semantic Segmentation
This specialized point cloud semantic segmentation dataset covers approximately 30 km of railway lines with 4.6 billion point cloud data points across urban, rural, and plateau railway environments. Collected using a vehicle-mounted LiDAR system, each point includes 3D coordinates, reflection intensity, and other rich attribute information. The dataset features fine-grained annotations for 11 categories: rails, track bed, masts, support equipment, contact network, fences, utility poles, vegetation, buildings, ground, and other objects, making it particularly suitable for railway safety monitoring, facility management, and track inspection applications.
Original Paper: Link
Published by: Wuhan University (2024)
21. Semantic3D: Large-Scale Point Cloud Classification Benchmark
Containing over 4 billion annotated points across diverse urban scenes including churches, streets, railroad tracks, squares, and villages, Semantic3D was collected using static high-precision 3D laser scanners that preserve fine scene details. It includes ground truth annotations and supports applications in smart city planning and robot navigation, providing rich 3D representation learning resources for urban infrastructure analysis and planning. Compared to other point cloud datasets like Oakland, NYU, and Sydney Urban Objects, Semantic3D offers advantages in point cloud density and scene diversity coverage.
Original Paper: Link
Published by: ETH Zurich (2017)
License: CC BY-NC-SA 3.0
22. ArchScanLib
This point cloud database documents aging brick arch bridges and viaducts in the UK, featuring detailed 3D geometric data for 16 historical structures. Collected using FARO Focus 3D laser scanners, it includes multi-angle point cloud scans of each bridge. While ArchScanLib doesn't provide semantic annotations, its high-precision 3D point cloud data offers significant value for infrastructure monitoring and historic building preservation. The dataset is particularly applicable to digital preservation of historical buildings in smart cities, structural monitoring, building pathology analysis, aging infrastructure assessment, and LiDAR-based bridge inspection.
Original Paper: Link
Published by: University of Cambridge (2025)
License: CC BY 4.0
23. OpenTrench3D: Point Cloud Dataset of Underground Utilities
The OpenTrench3D represents the first publicly available point cloud dataset of excavated trenches for underground utilities, designed specifically for semantic segmentation research of subterranean infrastructure. Comprising 310 fully annotated point cloud samples totaling approximately 528 million points, it covers 5 water supply projects and 2 district heating projects. Point cloud annotations include 5 categories: main facility, other facilities, trench, failed facilities, and miscellaneous, with each point containing spatial coordinates, RGB color, and category label information. It is particularly valuable for underground infrastructure management, intelligent construction planning, and underground utility monitoring and maintenance in smart cities.
Original Paper: Link
Published by: Aalborg University (2024)
License: CC BY-NC 4.0
LiDAR Point Cloud Dataset for Other Applications

24. Industrial Quality Control LiDAR Point Cloud Dataset
MVTec 3D Anomaly Detection Dataset (MVTec 3D-AD): This comprehensive dataset focuses on unsupervised 3D anomaly detection and localization, containing 4,147 high-resolution 3D point cloud scans across 10 real-world object categories. Training and validation sets include only anomaly-free samples, while test sets contain various anomaly types such as scratches, dents, holes, contamination, or deformation, with precise ground truth annotations. Created to simulate industrial inspection scenarios, the dataset addresses the challenge of detecting unknown defect types in practical applications and is suitable for training and evaluating deep learning methods.
Original Paper: Link
Published by: MVTec Software GmbH (2021)
License: CC BY-NC-SA 4.0
25. Smart Warehousing 3D Point Cloud Dataset
Lidar-warehouse-dataset: Designed specifically for industrial warehouse environments, this 3D object detection dataset includes 3,287 consecutive scan frames from a Velodyne Puck (VLP-16) LiDAR sensor. It provides 3D bounding box / cuboid annotations for five object classes: three different types of vehicle platforms (3,053 total samples), metal boxes (2,847 samples), and forklifts (481 samples). The data organization includes raw point cloud data, annotation files, and visualization resources, supporting applications in autonomous navigation, object detection and classification, robotic automation, and industrial inspection monitoring within warehouse environments.
Original Paper: Link
Published by: Karlsruher Institut für Technologie (KIT) and the ANavS GmbH (2024)
License: CC BY-SA 4.0
26. Forestry LiDAR Point Cloud Dataset
FUNDIVEUROPE Forest Point Cloud Dataset: This semantically annotated dataset contains Terrestrial Laser Scanning (TLS) point cloud data focused on leaf-wood segmentation in forest environments, featuring mature forest point clouds from different European regions. It includes meticulously annotated point clouds from nine 10m×10m blocks, with XYZ coordinates, reflectance values, and semantic labels (leaf and woody parts), downsampled to 1cm resolution through voxelization. The dataset employs deep learning architectures (based on PointNet++ and pointNEXT) for semantic segmentation, with particular attention to complete processing from trunk to branch tips, providing valuable data for studying plant structure, productivity, competition, and spatial optimization in forest ecology and remote sensing research.
Published by: Zenodo (2024)
License: CC-BY-NC-4.0
27. 3D Scene Reconstruction 3D Point Cloud Dataset
Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS): A widely used public dataset for 3D scene understanding research, S3DIS contains LiDAR scan point cloud data from 6 large indoor areas encompassing 271 rooms. Each point carries semantic annotations (walls, floors, ceilings, etc.), though structural element point clouds often contain noise and irregular shapes. The complementary S3DIS - STRUCTURAL - RECONSTRUCTION project addresses this issue by calculating minimum and maximum values along three coordinate axes and creating new regular point grids through linear interpolation, generating clean, smooth structural element point clouds. This approach transforms noisy real point clouds into ideal synthetic point clouds, providing more reliable structural data for subsequent point cloud segmentation and scene reconstruction tasks.
Original Paper: Link
Published by: Stanford Doerr School of Sustainability (2016)
28. Natural Disaster Monitoring LiDAR Point Cloud Dataset
High precision DEM acquired by LIDAR: This dataset records high-resolution LiDAR measurements of surface ruptures from the 2019 Le Teil earthquake. It includes a Digital Elevation Model (DEM) covering a 4km×0.5km area centered on the earthquake rupture zone, collected using a helicopter-mounted airborne LiDAR system. The high-precision LiDAR point cloud data overcomes limitations of ground surveys by providing broader, more uniform coverage, particularly valuable in densely vegetated areas where it can see through vegetation to observe terrain. The dataset holds significant value for studying earthquake surface ruptures, assessing seismic hazards, and understanding regional geological structures.
Published by: IRD, CNRS-INSU (2020)
License: CC BY 4.0
29. Smart Mining 3D Point Cloud Dataset
SubSurfaceGeoRobo: A Comprehensive Underground Dataset: Specifically developed for underground environment SLAM and geological monitoring, this comprehensive dataset addresses unique challenges of subterranean environments including extremely narrow passages, high humidity, standing water, reflective surfaces, uneven lighting, dusty conditions, complex geometries, and lack of texture. All sensors underwent joint calibration to ensure data accuracy and usability. SubSurfaceGeoRobo provides researchers with a free platform to advance SLAM, navigation, and SLAM-based geological monitoring techniques in underground environments, particularly applicable to mine monitoring, tunnel engineering, and safety assessment of underground structures.
Published by: PANGAEA (2025)
License: CC-BY-4.0
30. Security LiDAR Point Cloud Dataset
SUSTech1K: Large-Scale LiDAR-Based Gait Recognition Dataset: The first large-scale LiDAR-based gait recognition dataset, SUSTech1K contains 25,239 point cloud sequences from 1,050 subjects. Data collection utilized synchronized LiDAR sensors and RGB cameras, capturing various real-world scenario variations including visibility, viewpoint, occlusion, clothing, and carried items. The dataset provides time-synchronized multimodal data, enabling researchers to explore 3D geometric information in point clouds for gait recognition, with significant potential applications in security and surveillance domains.
Original Paper: Link
Published by: The University of Hong Kong, Southern University of Science and Technology, The Hong Kong Polytechnic University (2023)
Can't Find What You Need? Build Your Customized Datasets with BasicAI
Despite the valuable resources listed above, anyone working with 3D LiDAR knows the hard truth: high-quality point cloud data remains in short supply. Public datasets, while helpful, often don't quite match the specific requirements of specialized applications. Many breakthrough projects end up delayed or shelved because they can't get the right training data.
BasicAI has been working in this space for years, helping both Fortune 500 companies and top AI teams overcome these data challenges. Our 3D LiADR point cloud annotation services consistently deliver over 99% accuracy while keeping projects on schedule and costs manageable.
For organizations with strict data security requirements, we also offer on-premises deployment of our smart data annotation platform. This gives you access to professional-grade 3D LiDAR annotation tools while keeping sensitive data within your infrastructure.
If data limitations are holding back your point cloud project, feel free to reach out. We're happy to discuss how customized solutions might help move your work forward.
