Papers
Topics
Authors
Recent
2000 character limit reached

Occupancy Detection Datasets

Updated 10 December 2025
  • Occupancy detection datasets are annotated sensor collections that provide detailed occupancy labels for humans, vehicles, and objects across various environments.
  • They integrate diverse modalities such as LiDAR, RGB/thermal images, radar, and environmental sensors, with tailored annotation schemas for both object- and scene-centric tasks.
  • Standardized metrics like IoU, MAE, and panoptic quality enable rigorous benchmarking and support practical applications in autonomous systems, robotics, and security.

Occupancy detection datasets provide annotated sensor data for the direct evaluation and training of models that estimate the presence, absence, and semantic state of humans, objects, or vehicles in diverse environments. These datasets underpin research and applications across autonomous driving, robotics, smart buildings, parking management, and security systems. Data modalities include LiDAR, RGB/thermal/multispectral images, radar, environmental sensors, and electrical usage patterns, with annotation schemas tailored to the operational and semantic requirements of each use case.

1. Taxonomy and Domain Coverage

Occupancy detection datasets are distinguished by their environmental context, sensor modalities, spatial resolution, and annotation granularity. Key domains include:

  • Autonomous Vehicles: Scene-centric and object-centric occupancy datasets incorporate multimodal sensor streams (LiDAR, multi-view cameras, occasionally radar). Object-centric datasets (e.g., the vehicle volumes in "Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection" (Zheng et al., 6 Dec 2024)) focus on fine-scale per-object voxelization rather than full-scene maps.
  • Mobile Robotics: Datasets like MobileOcc (Kim et al., 21 Nov 2025) target pedestrian-rich environments, modeling deformable human occupancy and velocity at high frame rates.
  • Built Environments: Thermal, environmental sensor, and smart meter datasets address occupancy in buildings for energy management (ECO, NIOM (Luo et al., 2022), low-res thermal (Cokbas et al., 2020), appliance-driven (Lee et al., 2022)).
  • Parking Systems: Legacy patch classification datasets (PKLot) and detection-oriented datasets (SNU-SPS (Duong et al., 2022)) enable robust benchmarking of parking slot occupancy models.
  • Anomaly and OOD Detection: Synthetic OoD datasets (VAA-KITTI, VAA-KITTI-360 (Zhang et al., 26 Jun 2025)) simulate noncanonical object classes in standard driving datasets.
  • Indoor Scenes: Datasets like Occ-ScanNet (Yu et al., 16 Jul 2024) extend scale and diversity for room-by-room voxel occupancy analysis.
  • Vehicular Interior Sensing: Ultra-wideband radar datasets (UWBCarGraz (Möderl et al., 2023)) support model-based and deep-learning-based cabin occupancy and activity detection.

2. Data Acquisition and Annotation Methodologies

Each dataset employs a pipeline adapted to its sensor modalities and occupancy definition:

  • Object-centric aggregation: In 3D detection, LiDAR points are collected per annotated object track, transformed into local coordinates, aggregated over time, and voxelized to binary occupancy grids (e.g., 0.2 m voxel size for vehicles, with occlusion handled by LiDAR ray back-projection (Zheng et al., 6 Dec 2024)).
  • Scene-centric annotation: Surround-view datasets superimpose multiple LiDAR sweeps and fuse per-point semantics before voxelization. The Augmenting And Purifying (AAP) pipeline in OpenOccupancy (Wang et al., 2023) and Occ3D (Tian et al., 2023) extends initial sparse occupancy by self-training, pseudo-labeling, and extensive human annotation.
  • Synthetic anomaly injection: VAA-KITTI and VAA-KITTI-360 employ a three-phase synthetic anomaly pipeline—2D image patch generation, pseudo-depth alignment, and occlusion-preserving 3D projection—to create realistic Out-of-Distribution occupancy anomalies (Zhang et al., 26 Jun 2025).
  • Multi-modal fusion: MobileOcc incorporates human mesh optimization by fusing 2D keypoints, instance segmentation, and LiDAR points, refining SMPL meshes per pedestrian through joint optimization (Kim et al., 21 Nov 2025).
  • Environmental and pervasive sensing: Smart meter, environmental sensor, and home appliance datasets provide time-series tabular data, with occupancy labels derived from combination rules over motion, door sensors, or heuristic appliance activity (Lee et al., 2022, Luo et al., 2022).
  • Thermal and low-resolution approaches: Doorway occupancy (TIDOS (Cokbas et al., 2020)) uses low-resolution thermal sensors and blob-based tracking.
  • Multi-agent cooperative annotation: Platforms such as OpenCOOD and UniOcc (Wang et al., 31 Mar 2025) blend synthetic and real-world driving scenes, integrating occupancy and per-voxel flow for cooperative perception.

3. Data Structure, Resolution, and Semantic Classes

Occupancy datasets specify spatial grid parameters, class sets, and annotation formats according to domain:

Dataset/Benchmark Grid Dimension Voxel Size Classes
Occ3D-Waymo 3200×3200×128 0.05 m 15+GO
OpenOccupancy-nuScenes 40×512×512 0.2 m 17
Object-centric (Waymo) Per-object (Rx×Ry×Rz) 0.2 m Vehicle
MobileOcc (robotics) 60×60×36 0.2/0.02 m 9+free/unkn.
Occ-ScanNet (indoor) 60×60×36 0.1 m 12
VAA-KITTI/-360 256×256×32 0.2 m 19+anomaly
PKLot (parking) per-slot patch – 2
SNU-SPS (parking) per-image boxes – 4
UWBCarGraz (vehicle) N/A (CIR matrices) – 2 (occup.), 3 (act.)

Semantic granularity ranges from binary (occupied/free) to multi-class (vehicle types, furniture, human categories, anomaly types).

4. Benchmarking Protocols and Evaluation Metrics

Standardized protocols use geometric and semantic Intersection-over-Union (IoU), panoptic quality (PQ), recall, and class-wise precision. Specialized metrics include:

Benchmarks recommend reporting results over both box/track-level and frame/scene-level proposals, with downstream evaluation of 3D detection accuracy in certain datasets (Zheng et al., 6 Dec 2024, Tian et al., 2023).

5. Dataset Accessibility, Licensing, and Limitations

Accessibility varies considerably:

  • Public download: Most large-scale benchmarks (Occ3D (Tian et al., 2023), OpenOccupancy (Wang et al., 2023), Object-centric (Zheng et al., 6 Dec 2024), MobileOcc (Kim et al., 21 Nov 2025), Occ-ScanNet (Yu et al., 16 Jul 2024), VAA-KITTI (Zhang et al., 26 Jun 2025), UWBCarGraz (Möderl et al., 2023), TIDOS (Cokbas et al., 2020)) provide open or research-only licenses, mostly via GitHub, institutional or publisher repositories.
  • Restricted access: Some time-series and environmental sensor datasets (appliance-driven (Lee et al., 2022), smart meter (Luo et al., 2022)) require contacting the corresponding author or negotiating a sharing agreement.
  • Licensing terms: Most vision, driving, and robotics datasets are released under variations of CC BY-NC or similar non-commercial research licenses (MobileOcc: CC BY-NC-SA, UniOcc: CC BY-NC, Waymo-based annotations: Waymo’s own terms).
  • Dataset-specific limitations: Resolution and completeness trade-offs (e.g., per-object vs. full-scene), class imbalance, domain-restricted labels, manual or semi-automatic annotation bottlenecks, large storage and compute requirements (e.g. OpenOccupancy at >1.4×10{10} voxels labeled), lacking validation splits in certain benchmarks (SNU-SPS), limited class diversity, or absence of fine-grained OOD events in legacy datasets.

6. Comparative Analysis and Research Directions

Recent advances emphasize:

  • Object-centric occupancy: Direct per-object completion yields finer geometry and actionable features for detection heads, enabling higher voxel resolution without scaling memory costs to full-scene size (Zheng et al., 6 Dec 2024).
  • Human-aware semantic occupancy: Deformable mesh-based annotation for pedestrians significantly exceeds rigid bounding-box coverage, supporting velocity prediction and per-instance panoptic labels (Kim et al., 21 Nov 2025).
  • Dense, visibility-aware annotation: Pipelines that fuse image, LiDAR, and semantic masks, combined with human purification, produce annotation volumes with ~2× the occupied voxels of earlier LiDAR-only methods (Wang et al., 2023, Tian et al., 2023).
  • OOD and anomaly detection: Construction of physically plausible synthetic anomalies to test out-of-distribution generalization (Zhang et al., 26 Jun 2025).
  • Temporal and cooperative occupancy forecasting: Multi-agent datasets (OpenCOOD, UniOcc (Wang et al., 31 Mar 2025)) provide flow-based ground truth for future occupancy and collaboration between CAVs.
  • Indoor scene expansion: Large-scale voxel occupancy, notably Occ-ScanNet (Yu et al., 16 Jul 2024), fills the gap left by small, legacy indoor datasets.
  • Real-world deployment constraints: Object-detection-based parking datasets (SNU-SPS) and UWB radar datasets (UWBCarGraz) focus on practical scalability, low computation, and robust results under diverse environmental and activity conditions.

A plausible implication is that future occupancy detection research will converge on unified, multi-modal, and context-adaptive datasets, integrating per-object detail, panoptic instance labels, temporal and flow information, and OOD challenge sets.

7. Applications and Impact

Occupancy detection datasets directly enable:

  • Autonomous driving: Full-scene and per-object occupancy/semantic prediction underpin robust shape completion, improved detection for distant/incomplete targets, and scenario planning (Zheng et al., 6 Dec 2024, Tian et al., 2023, Wang et al., 2023, Wang et al., 31 Mar 2025).
  • Mobile robotics: Dense, near-field human modeling and velocity prediction supports safe navigation in pedestrian-dense spaces (Kim et al., 21 Nov 2025).
  • Building management: Occupancy estimation datasets (appliance use (Lee et al., 2022), smart meters (Luo et al., 2022), thermal sensors (Cokbas et al., 2020)) reduce energy consumption and improve HVAC targeting.
  • Parking automation: Datasets with object-level detection, real-world metadata, and multi-class semantic slot labels (SNU-SPS) enable end-to-end assignment, occupancy analytics, and public-sector integration (Duong et al., 2022).
  • Security and anomaly monitoring: OOD datasets seed developments in anomaly-resilient scene understanding, with relevance for safety-critical applications (Zhang et al., 26 Jun 2025).
  • Sensor algorithm benchmarking: Open benchmarks with multi-modal streams (UWBCarGraz) enable comparative analysis of radar-based, camera-based, and hybrid occupancy algorithms under controlled SNR and activity levels (Möderl et al., 2023).

Occupancy detection datasets have thus become foundational for rigorous algorithmic development, benchmarking, and deployment across autonomous systems, indoor analytics, and intelligent environments.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Occupancy Detection Datasets.