Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

EuRoC Dataset: Visual-Inertial Benchmark

Updated 10 November 2025
  • EuRoC Dataset is a comprehensive benchmark offering synchronized visual and inertial data along with precise 6-DoF poses for trajectory and 3D mesh reconstruction evaluation.
  • It enables rigorous assessment of visual-inertial odometry by using high-frequency data from monocular/stereo cameras and IMUs, complemented by detailed ground truth and mesh geometry.
  • The dataset is essential for developing robust dense mapping methods in challenging scenarios, including low-texture, variable illumination, and rapid motion environments.

The EuRoC dataset is a widely-used benchmark for evaluating visual-inertial mapping, odometry, and dense reconstruction in robotics and computer vision. It consists of closed-loop indoor flight sequences captured using a micro aerial vehicle (MAV) equipped with synchronized monocular or stereo cameras and inertial measurement units (IMUs). The primary application is assessing systems’ performance under challenging scenarios with precise ground-truth, including detailed mesh geometry and time-stamped 6-DoF pose trajectories. Recent research demonstrates its centrality for rigorous validation of dense mapping approaches, particularly in environments with low texture, varying illumination, rapid motion, and geometrical complexity.

1. Composition and Modalities

The EuRoC dataset, as employed in state-of-the-art dense mapping research, features a sensor suite comprising a monocular global-shutter camera operating at 20 Hz and an IMU at 200 Hz, both rigidly synchronized and co-calibrated. Data were collected across six sequences:

  • V1_01_easy, V1_02_medium, V1_03_difficult: industrial machine-hall environment
  • V2_01_easy, V2_02_medium, V2_03_difficult: motion-capture (“Vicon”) laboratory

Each sequence encompasses a 30–60 second MAV flight and includes high-fidelity 6-DoF pose ground truth from external motion capture, plus high-resolution point-cloud scans of the operating environment for geometric assessment. This configuration enables evaluation of both trajectory estimation and dense 3D map reconstruction with metrically consistent reference data.

2. Evaluation Protocols and Experimental Design

Protocols for deploying the EuRoC dataset emphasize reproducibility and comparability. A typical evaluation workflow for visual-inertial dense mapping includes:

  • Front-end VIO Processing: Monocular images (20 Hz) and IMU data (200 Hz) are processed by feature-based VIO, specifically ORB-SLAM3 without loop closure, to yield metric-scale camera trajectory and noisy sparse 3D landmarks.
  • Keyframe Selection: Sliding windows of eight keyframes are selected for multi-view stereo (MVS) network input. Co-visibility and pose-distance criteria (pth=0.20p_\mathrm{th}=0.20 m, tth=0.25t_\mathrm{th}=0.25 m) guarantee adequate overlap and baseline for MVS, as formalized in Eqn. 1 of the referenced protocol.
  • Sparse-depth Generation: VIO landmarks are projected to the reference keyframe (after filtering for <<5 m depth and <<2 px reprojection error), forming sparse-depth maps with typical valid depth counts in the 80–300 range.
  • Test Set Usage: No retraining or fine-tuning occurs on EuRoC; all models are trained solely on external datasets (e.g., ScanNet), ensuring cross-domain generalization is measured.
  • Ground-truth Synthesis: Dense ground-truth depth maps are constructed via projection of the ground-truth point cloud into each camera frustum, discarding points with rendered depth >>3 m to mitigate edge effects.
  • Fusion: Predicted depth maps are incorporated in an incremental voxel-hashed truncated signed distance field (TSDF) volume with subsequent Marching Cubes mesh extraction.

This rigorous protocol ensures that evaluation isolates algorithmic generalization and robustness under realistic, previously unseen environmental conditions.

3. Performance Metrics and Quantitative Comparisons

Evaluation metrics conform to established standards for depth and surface reconstruction:

  • 2D Depth Metrics (following Eigen et al. 2014):
    • Absolute difference (AbsDiff)=1ΩuD^(u)Dgt(u)(\mathrm{AbsDiff}) = \frac{1}{|\Omega|} \sum_{u} |\hat{D}(u) - D_\mathrm{gt}(u)|
    • Root-mean-square error (RMSE)
    • Accuracy δ1.25\delta\langle 1.25\rangle: percentage of pixels uu for which max(D^(u)/Dgt(u),Dgt(u)/D^(u))<1.25\max(\hat{D}(u)/D_\mathrm{gt}(u), D_\mathrm{gt}(u)/\hat{D}(u)) < 1.25
  • 3D Mesh Quality (from Božić et al. 2021):
    • Accuracy (cm): mean mesh-to-point-cloud Euclidean distance
    • Completeness (cm): mean point-cloud-to-mesh distance
    • Chamfer L1L_1: arithmetic mean of accuracy and completeness
    • Precision [%]: percentage of mesh vertices within 5 cm of ground truth
    • Recall [%]: percentage of ground-truth points within 5 cm of the mesh
    • F-score [%]: F=2PrecisionRecallPrecision+RecallF = \frac{2 \cdot \mathrm{Precision} \cdot \mathrm{Recall}}{\mathrm{Precision} + \mathrm{Recall}} with a 5 cm tolerance

A detailed breakdown of results achieved by recent dense mapping systems on all six EuRoC sequences is as follows:

Metric V1_01 V1_02 V1_03 V2_01 V2_02 V2_03 Average
AbsDiff [m] 0.251 0.214 0.262 0.246 0.316 0.189 0.246
RMSE [m] 0.411 0.334 0.374 0.385 0.492 0.287 0.380
δ1.25\delta\langle1.25\rangle [%] 91.11 93.17 92.61 90.79 88.46 94.68 91.80
Fused AbsDiff [m] 0.221 0.157 0.188 0.162 0.192 0.114 0.172
Fused RMSE [m] 0.377 0.280 0.296 0.318 0.383 0.216 0.321
Fused δ1.25\delta\langle1.25\rangle [%] 92.53 94.83 95.80 94.06 93.74 97.33 94.44
Accuracy [cm] 10.50 7.70 12.62 11.97 12.34 7.04 10.36
Completeness [cm] 10.50 9.73 7.76 8.90 10.40 6.77 9.01
Recall [%] 49.18 51.33 46.49 59.43 52.12 51.87 51.74
F-score [%] 48.38 47.19 40.96 55.09 48.21 51.19 48.50

Comparative data indicate that the nearest pure-vision competitor (TANDEM) attains an average F-score of 34.72%. A 13.78-point absolute increase constitutes a 39.7% relative improvement. This suggests that visual-inertial priors and sparse-depth completion significantly outperform vision-only pipelines in the EuRoC regime.

4. Challenges and Qualitative Observations

Two principal challenge axes are prominent in EuRoC:

  • The “difficult” sequences (V1_03, V2_03) feature rapid camera motion, which commonly impairs pure-vision MVS approaches. Visual-inertial systems leveraging IMU data and sparse-depth priors demonstrate resilience, maintaining geometric integrity where purely visual systems degrade.
  • Environments with low-texture walls or low illumination, particularly in the Vicon (V2) room, often yield insufficient sparse landmark density, constraining depth propagation and completion accuracy. This low-texture condition is noted as the most frequent failure mode.
  • Additional challenges stem from highly reflective, occluded surfaces and large depth discontinuities, which can induce errors in MVS depth estimation regardless of the front-end architecture.

In all reconstructions, it is observed that the best-performing visual-inertial systems preserve geometric fine structure—e.g., ladders, chair legs—where others introduce noise and incomplete regions.

5. Implementation and Calibration Details

The dataset’s camera–IMU calibration parameters, provided with the rig, are inherited directly by front-end VIO systems (e.g., ORB-SLAM3 preintegration). Keyframe selection for MVS is algorithmically determined by a composite penalty on translation and rotation (cf. Eqn. 1 in the referenced protocol), optimizing visibility and parallax across the eight-keyframe window.

Fused depth maps from MVS are integrated using voxel-hashed TSDF as per Niessner et al. (2013), followed by mesh extraction via Marching Cubes. For quantitative mesh evaluation, the reference point cloud is pruned for visibility with respect to the camera frustum, and a subset of 800k random samples is evaluated to compute precision, recall, and F-score. Alignment to ground truth is done with SE(3) transforms for metric-scale VIO methods and with Sim(3) for monocular-only pipelines, standardizing scale and pose comparison.

6. Relevance in Contemporary Dense Mapping Research

The EuRoC dataset underpins the evaluation of generalization, robustness, and real-time performance in dense mapping. It is notable that leading visual-inertial systems achieve sub-25 cm depth RMSE, reconstruct metrically accurate, dense 3D meshes, and are robust to motion and texture conditions with no additional domain-specific retraining. The availability of synchronized, high-rate, multi-modal data, combined with accurate geometric and trajectory ground truth across varied environmental conditions, ensures that the EuRoC dataset remains an indispensable asset for benchmarking progress in visual-inertial odometry and dense mapping research.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to EuRoC Dataset.