Papers
Topics
Authors
Recent
2000 character limit reached

Light Field Object Tracking Dataset

Updated 22 December 2025
  • Light field object tracking datasets are curated collections of dense spatial-angular images with precise ground truth for robust 6DoF and multi-object tracking.
  • They leverage custom and commercial plenoptic cameras to capture complex visual scenarios such as low-light, occlusions, and reflective surfaces.
  • They enable research in robotics, autonomous navigation, and computer vision by offering reproducible benchmarks with detailed calibration and annotation protocols.

A light field object tracking dataset comprises a curated collection of digital plenoptic (light field) images and/or videos, together with ground-truth object annotations, for the development and benchmarking of object tracking algorithms that exploit the angular and spatial richness of light field sensors. Such datasets support fine-grained spatial, angular, and sometimes temporal tracking in scenarios where traditional RGB data is insufficient, notably under complex visual effects (e.g., reflectivity, occlusion, or low-light conditions). Recent datasets have established protocols for 6DoF pose estimation as well as single-object and multi-object tracking benchmarks, incorporating both real and synthetic content, high-precision ground truth, and challenging object/material compositions (Goncharov et al., 15 Dec 2025, Wang et al., 29 Jul 2025).

1. Sensor Configurations and Light Field Acquisition

Recent datasets employ custom or commercial plenoptic cameras, capturing dense spatial-angular samples for each scene. Key technical configurations include:

  • Custom 4D Plenoptic System (LiFT-6DoF dataset (Goncharov et al., 15 Dec 2025)):
    • Imaging sensor: global-shutter CMOS, 12 MP (4000×3000 px).
    • Microlens array: 17×17 mm², 0.2 mm pitch, 0.85 mm focal length, placed at sensor focal plane.
    • Objective lens: f = 50 mm, f/2.8.
    • Angular sampling: Uniform 9×9 grid (U = V = 9).
    • Each subview: 444×444 px, ≈0.9 MP.
    • Extrinsic calibration: Each subview indexed by (s, t) has an explicit 4×4 rigid transform Ts,tT_{s,t}, with translation increments of 5 mm and identity rotation.
  • Raytrix R8 System (large-scale low-light dataset (Wang et al., 29 Jul 2025)):
    • Commercial plenoptic R8 camera, with a 5×5 micro-lens array.
    • Per-view resolution: 1080×1920 pixels.
    • Frame rate: 25 Hz.
    • Angular sampling: U = V = 5.

Intrinsic parameters (camera matrices, distortion corrections) are included with each dataset, enabling precise ray-based reconstruction or forward modeling.

2. Object Inventory, Scene Design, and Lighting Regimes

Datasets are constructed to probe tracking robustness under diverse object geometry, material response, and environment:

  • LiFT-6DoF dataset (Goncharov et al., 15 Dec 2025):
    • Natural objects: Box (diffuse), Tea Box (textured), Shiny Box (anodized with specularities), Jug (glossy ceramic, with tilt variation).
    • Synthetic objects: Blender-rendered Toy Car (matte and specular variants).
    • Material properties: Diffuse (albedo <10%), Specular (reflectivity >60%, Phong roughness ≈0.1), Textured (PBR-mapped).
    • Lighting: Natural—4 overhead LED panels (500 lux) + 2 spotlights (1,000 lux); Synthetic—HDRI environments.
    • Neutral gray backdrops; distractor clutter in synthetic set.
  • Low-light tracking dataset (Wang et al., 29 Jul 2025):
    • Object classes: Toy cars, glass marbles, industrial nuts, live fish, generic deforming shapes.
    • Environment: Real indoor scenes captured under low-illumination; motion blur and reduced signal-to-noise are defining characteristics.
    • Tabletop and aquatic setups permit frequent occlusions, high deformation, and distractor similarity.

3. Annotation Methodologies and Ground-Truth Quality

Annotation fidelity is central to dataset utility for pose estimation and tracking metrics:

  • LiFT-6DoF (Goncharov et al., 15 Dec 2025):
    • Pose annotation: 4×4 SE(3) homogeneous matrices Tt=[Rt  tt;0  1]T_t = [R_t\; t_t; 0\; 1], with RtSO(3)R_t \in SO(3) and ttR3t_t \in \mathbb{R}^3. Rotations also recast as Hamiltonian quaternions (w,x,y,z)(w,x,y,z).
    • Coordinate system: Camera-centric, origin at central view’s optical center, z along optical axis.
    • Annotation device: Robotized linear/rotary stage, encoder resolution 0.01°/0.01 mm; resulting accuracy: translation ±0.05 mm, rotation ±0.02°, depth (via RealSense D435i) ±1 mm.
    • Synthetic data inherits ground truth directly from Blender scene state.
    • Disparity maps provided (16-bit PNG) for central view.
  • Low-light dataset (Wang et al., 29 Jul 2025):
    • 2D bounding boxes: (x, y, w, h), per central image and frame.
    • Unique track IDs (MOT16 protocol); continuity maintained by expert annotation (30 annotators) through manual inspection.
    • Epipolar-plane structure images (ESI) S(x,y)=(uL)2+(vL)2S(x,y) = \sqrt{(\partial_u L)^2 + (\partial_v L)^2} are distributed for geometric cue exploitation.
    • No instance-level segmentation or 3D ground truth supplied.

4. Dataset Composition, Structure, and Splits

Datasets are systematically organized to enable reproducible benchmark construction, with clear train/test partitions.

LiFT-6DoF dataset (Goncharov et al., 15 Dec 2025)

Content Real (6 seqs) Synthetic (3 seqs) Per-sequence Frames
Materials Diffuse, glossy, textured Matte/shiny car 15–20
Directory tree real/<object>, synthetic/<object>
Data per frame 9×9 PNG subviews, depth PNG, pose (.txt)
Train split Box, TeaBox Car, ShinyCar
Test split Jug, ShinyBox, Jug_Tilt, TeaBox_Tilt

Total: ≈108 real + 36 synthetic = ≈144 frames.

Low-light tracking dataset (Wang et al., 29 Jul 2025)

Benchmark Task Sequences Train/Test Split Frame Counts
MOT 52 (26+26) 26 train / 26 test ≈3,900
SOT 173 102 train / 71 test ≈3,500 train, 15,380 test

Both real data and synthetic data are organized as sequences containing raw views, derived features/maps, and annotation files.

5. Benchmarking Protocols and Evaluation Metrics

Standardized metrics facilitate comparison across methods and datasets:

  • 6DoF pose tracking (LiFT-6DoF (Goncharov et al., 15 Dec 2025)):
    • ADD (Average Distance of Model Points):

    ADD=1MxM(Rx+t)(R^x+t^)2\operatorname{ADD} = \frac{1}{|M|} \sum_{x \in M} \| (R x + t) - (\hat{R} x + \hat{t}) \|_2 - ADD-S (mean point-to-closest-point for symmetric objects). - AUC (area under ADD/[ADD-S] curve, threshold 0.1 m). - Absolute rotation error: ΔRabs=arccos(trace(RR^)12)\Delta R_{\mathrm{abs}} = \arccos \left( \frac{\operatorname{trace}(R^\top \hat{R}) - 1}{2} \right) (in degrees). - Absolute translation error: Δtabs=tt^2\Delta t_{\mathrm{abs}} = \| t - \hat{t} \|_2 (in metres). - Relative errors: frame-to-frame change ΔT=Tt11Tt\Delta T = T_{t-1}^{-1} T_t. - Success rates: % frames with Δtabs<0.02\Delta t_{\mathrm{abs}} < 0.02 m and ΔRabs<5\Delta R_{\mathrm{abs}} < 5^\circ.

  • Single/Multi-object tracking (Low-light dataset (Wang et al., 29 Jul 2025)):

    • SOT: Precision (@20 px), Success (IoU curve), Normalized Precision, centered-error and overlap curves as in OTB.
    • MOT: MOTA, IDF1, ID switches, FP, FN, per CLEAR-MOT protocol.

Reported methods demonstrate the efficacy of light field cues. For instance, ATINet achieves SOT Success = 0.64, Precision = 0.79, Norm. Prec. = 0.81; AMTrack MOTA = 87.5%, IDF1 = 85.4% (Wang et al., 29 Jul 2025). In 6DoF pose tracking, dataset baselines compare light-field features and vision foundation model-based splats to state-of-the-art reference model approaches such as FoundationPose, with similar translation and rotation accuracy (Goncharov et al., 15 Dec 2025).

6. Unique Features and Comparative Analysis

Light field tracking datasets distinguish themselves from prior benchmarks by:

  • Providing dense angular sampling (e.g., 9×9 or 5×5) for spatial-angular method evaluation, unavailable in conventional RGB datasets.
  • Enabling robust tracking under:
  • Including precise, robotically measured 6DoF ground truth (sub-millimeter, sub-degree error), as well as large-scale video sequences for temporal modeling.
  • Supplying supporting calibration, disparity data, and explicit intrinsic/extrinsic parameters.

Earlier datasets (e.g., HCI-LF, Stanford Lytro Illum, UrbanLF) are limited to static or focal stack images, lacking both dynamic frames and realistic low-light scenarios.

7. Applications and Availability

Light field object tracking datasets facilitate research in:

  • 3D pose tracking (6DoF) from multi-view inputs, without pretrained object models.
  • Robust tracking in robotic manipulation, autonomous driving, and dynamic scene understanding.
  • Benchmarking novel representations (e.g., view-dependent Gaussian splats, ESI features), self-supervised temporal modeling (e.g., ATINet), and spatial-angular interaction networks.

Datasets, code, and detailed documentation are publicly released (e.g., https://github.com/nagonch/LiFT-6DoF (Goncharov et al., 15 Dec 2025)), supporting immediate experimental reproducibility with exact camera intrinsics, extrinsics, and file structures defined for direct integration into modern ray-based or epipolar-plane analysis pipelines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Light Field Object Tracking Dataset.