EDmamba: A Simple yet Effective Event Denoising Method with State Space Model

Published 8 May 2025 in cs.CV | (2505.05391v2)

Abstract: Event cameras excel in high-speed vision due to their high temporal resolution, high dynamic range, and low power consumption. However, as dynamic vision sensors, their output is inherently noisy, making efficient denoising essential to preserve their ultra-low latency and real-time processing capabilities. Existing event denoising methods struggle with a critical dilemma: computationally intensive approaches compromise the sensor's high-speed advantage, while lightweight methods often lack robustness across varying noise levels. To address this, we propose a novel event denoising framework based on State Space Models (SSMs). Our approach represents events as 4D event clouds and includes a Coarse Feature Extraction (CFE) module that extracts embedding features from both geometric and polarity-aware subspaces. The model is further composed of two essential components: A Spatial Mamba (S-SSM) that models local geometric structures and a Temporal Mamba (T-SSM) that captures global temporal dynamics, efficiently propagating spatiotemporal features across events. Experiments demonstrate that our method achieves state-of-the-art accuracy and efficiency, with 88.89K parameters, 0.0685s per 100K events inference time, and a 0.982 accuracy score, outperforming Transformer-based methods by 2.08% in denoising accuracy and 36X faster.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

Event Denoising with State Space Models: A Methodological Overview

Event cameras represent a departure from traditional frame-based imaging, capturing high-speed motion through asynchronous detection of intensity changes at microsecond timescales. They boast a high dynamic range and low power consumption, favoring applications in dynamic vision tasks. However, these sensors are plagued by noise, particularly when capturing low-light or high-speed scenes. The paper "EDmamba: A Simple yet Effective Event Denoising Method with State Space Model" addresses the pivotal task of event denoising, presenting a novel framework that leverages State Space Models (SSMs) to efficiently filter noise without compromising computational efficiency.

Methodological Advances

EDmamba introduces a robust approach to event denoising through three primary innovations:

4D Event Cloud Representation: This representation encapsulates spatial, temporal, and polarity data, enabling comprehensive modeling of event characteristics. The method effectively converts an irregular stream of raw event data into a structured map, facilitating coherent feature extraction and subsequent analysis.
Coarse Feature Extraction Module: The separation of geometric and polarity-aware features into distinct embeddings allows the model to preserve crucial inductive biases necessary for effective denoising. This processing stage ensures that spatial correlations and signal polarity are effectively captured and utilized, establishing a foundation for higher-level feature abstraction.
Spatial-Temporal State Space Models (S-SSM and T-SSM): The model integrates both Spatial and Temporal Mamba architectures to address local geometric relationships and global temporal dynamics respectively. These modules leverage the efficiency of SSMs, offering linear-time sequence modeling that enables real-time event processing, even across ultra-high-rate streams.

Numerical Results and Performance Metrics

The authors demonstrate EDmamba's improvements using labeled datasets such as DND21 and DVSCLEAN, showing superior denoising accuracy (with an average accuracy score of 0.982) over contemporary methods like Transformer-based models, where the improvement is quantified as 2.08% higher denoising accuracy and 36 times faster inference. Additional experiments with unlabeled datasets confirm the model's robustness, maintaining performance across diverse scenarios without relying on pre-established labels or auxiliary sensor data.

Through detailed ablation studies, the effectiveness of geometric and polarity feature extraction as well as spatial-temporal modeling is validated, cementing their roles in achieving optimal denoising across varying contexts. EDmamba's performance metric, the Mean Event Structural Ratio (MESR), further establishes its efficacy in real-world environments, indicating solid generalization capability under both controlled and ambient conditions.

Implications and Future Directions

EDmamba marks a significant step forward in event-based vision processing, offering a practical solution to longstanding challenges associated with dynamic vision sensors. By integrating efficient state space sequences and feature extraction techniques, the framework not only addresses existing scalability and robustness issues but also paves the way for broader deployment across resource-constrained platforms.

Significant practical implications include improvements in obstacle avoidance, visual SLAM, and high-speed tracking applications where real-time noise suppression is paramount. The theoretical underpinnings also spur further inquiry into the application of state space models across various AI domains, particularly where sequence modeling plays a critical role.

Future research could investigate lightweight adaptations of the EDmamba architecture suitable for mobile and embedded systems, ensuring seamless integration into existing vision pipelines without substantial computational overhead. Additionally, exploration into synergistic effects of combining SSMs with other deep learning paradigms could unlock new pathways for innovation within event-based vision scenarios.

In summary, EDmamba represents a highly efficient, effective approach to event denoising through adept combinations of state space modeling and coherent feature representation—serving as a key tool for advancing high-speed vision applications.

Markdown Report Issue