Event-aided Direct Sparse Odometry (EDS)
- EDS is a visual odometry technique that fuses high-temporal-resolution event data with standard image frames to enable robust state estimation.
- It employs a direct probabilistic formulation and sliding-window photometric bundle adjustment for precise motion estimation and semi-dense 3D mapping.
- EDS demonstrates low RMS translational and rotational errors in both indoor and high-dynamic scenarios, outperforming traditional frame-based methods.
Event-aided Direct Sparse Odometry (EDS) is a class of visual odometry (VO) algorithms that directly fuses asynchronous event streams from event cameras with standard image frames (and optionally depth) to estimate 6-DoF camera motion and reconstruct semi-dense 3D maps. By leveraging the high temporal resolution, high dynamic range, and blur resilience of event cameras, EDS provides robust, low-latency odometry even under rapid motion, sparse frames, or challenging illumination. The technique uniquely integrates a direct probabilistic formulation of per-pixel brightness increments predicted via sparse 3D structure and jointly optimized with image-based photometric constraints, enabling accurate and efficient state estimation in scenarios where conventional frame-based VO and SLAM struggle (Hidalgo-Carrió et al., 2022, Zhu et al., 2023).
1. Event Generation and Signal Model
EDS exploits the event generation principle of event cameras: each pixel asynchronously triggers an event whenever the local logarithmic intensity changes by a predefined contrast threshold :
A first-order Taylor expansion under the optical flow assumption relates observed brightness increments to pixel velocities induced by rigid body motion:
with image-plane velocity parameterized by camera body velocity via:
where encodes projection geometry and depth . The observed polarity-weighted event increments are accumulated with Gaussian temporal weighting to minimize motion blur:
0
A probabilistic generative model describes the likelihood of each event given the predicted increment:
1
where 2 denotes the standard normal CDF and 3 captures sensor/event noise (Hidalgo-Carrió et al., 2022).
2. Direct Probabilistic Motion Formulation
For each active pixel (with high gradient and sufficient events), EDS defines a brightness increment residual:
4
where 5 is the model-predicted brightness increment under a given motion hypothesis. The weighted least squares objective
6
is minimized over SE(3) increments via Gauss-Newton or Levenberg–Marquardt, with robust (Huber) per-pixel weights to attenuate outliers. Parameters are iteratively updated in a small-angle, Lie-algebra parametrization for numerical stability (Hidalgo-Carrió et al., 2022).
3. Sparse 3D Structure Selection and Parameterization
EDS implements a semi-dense mapping strategy: each keyframe selects a sparse set of high-gradient pixels, dividing the frame into tiles and retaining those with the highest Sobel gradient magnitude (typically top 10–15% per tile). The 3D structure is parameterized via inverse depth 7, initialized by reprojecting depths from already-mapped keyframes and interpolated using inverse-depth kd-tree search. This approach provides a computationally efficient yet geometrically informative set of points for direct photometric and event-based optimization (Hidalgo-Carrió et al., 2022).
4. Global Optimization: Photometric Bundle Adjustment
For map and trajectory refinement, EDS employs a sliding-window photometric bundle adjustment (BA) over all active keyframes. The optimization jointly refines all poses 8 and all inverse-depths 9 by minimizing robust semi-dense photometric error:
0
where 1 is the 3D point at pixel 2 in keyframe 3, 4 is the projection, 5 is the intensity in keyframe 6, and 7 is the reprojected event-induced increment. Huber costs mitigate the influence of degenerate correspondences or outliers. This BA typically operates on windows of 87 keyframes and is implemented using automatic differentiation in Ceres (Hidalgo-Carrió et al., 2022).
5. Algorithmic Workflow
The end-to-end EDS pipeline consists of the following loop:
- Initialization: Coarse DSO-like bootstrapping on initial frames.
- Frontend tracking (events): For each incoming frame, collect event packets; accumulate 9; perform incremental motion tracking via direct optimization.
- Keyframe management: Trigger new keyframes when tracked coverage falls or rotation threshold is exceeded; select new sparse points and initialize depths.
- Backend mapping: Perform sliding-window semi-dense photometric BA; update the map.
- Event batching: Events are processed in overlapping packets (e.g., 20k events, 50% overlap) to optimize signal-to-noise ratio while minimizing blur (Hidalgo-Carrió et al., 2022).
Performance is maintained at 060 Hz in the frontend and 120 Hz for frames; sparse frames are sufficient due to the event stream bridging intervals (“blind time”).
6. Extension to RGB-D Data and Adaptive Event Surfaces
A variant EDS pipeline for robotics fuses RGB-D frames with events. An adaptive time surface (ATS) addresses TS “whiteout”/“blackout” by deploying pixel-wise, motion-adaptive decay rates:
2
3
Pixel selection from the ATS then prioritizes spatially well-distributed, high-contrast, high-gradient regions. The full EDS objective jointly aligns RGB-D patch photometric errors and event-ATS patch errors. The final energy is
4
integrating both modalities with regularization (Zhu et al., 2023).
7. Benchmark Performance and Application Scenarios
On indoor DAVIS-based benchmarks, monocular EDS achieves 1–2 cm RMS translational error and 1–2° RMS rotational error, outperforming monocular event-only VO (EVO, USLAM) and matching or slightly exceeding DSO and ORB-SLAM under normal frame rates. When frame rates are reduced from 20 Hz to 5 Hz, frame-only methods degrade or lose track, whereas EDS remains robust due to continuous event tracking. In robotics applications, EDS with ATS demonstrates ATE below 2 cm and competitive relative pose error (RPE) across high-dynamics tasks (e.g., bounding/backflipping quadruped robots, angular rates up to 510 °/s) on datasets where classical methods diverge (Hidalgo-Carrió et al., 2022, Zhu et al., 2023).
A summary of results is as follows:
| Dataset/Scenario | EDS Translational Error / ATE | Best Baseline |
|---|---|---|
| Indoor DAVIS (bin/boxes/etc) | 1–2 cm rms / 1–2° rot error | DSO/ORB-SLAM (similar) |
| MVSEC flying (robot) | 1.2–1.75 cm ATE | DEVO 2.4–7.1 cm |
| Mini-Cheetah (bounding) | 0.42 cm ATE | Baselines diverged |
EDS thus enables low-power, high-dynamic-range, and robust odometry for AR/VR, nano-UAVs, legged robots, and other environments where traditional frame-based VO is challenged by lighting, speed, or frame rate constraints (Hidalgo-Carrió et al., 2022, Zhu et al., 2023).