EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion (2312.16933v1)
Abstract: Event cameras and RGB cameras exhibit complementary characteristics in imaging: the former possesses high dynamic range (HDR) and high temporal resolution, while the latter provides rich texture and color information. This makes the integration of event cameras into middle- and high-level RGB-based vision tasks highly promising. However, challenges arise in multi-modal fusion, data annotation, and model architecture design. In this paper, we propose EvPlug, which learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. The learned fusion module integrates event streams with image features in the form of a plug-in, endowing the RGB-based model to be robust to HDR and fast motion scenes while enabling high temporal resolution inference. Our method only requires unlabeled event-image pairs (no pixel-wise alignment required) and does not alter the structure or weights of the RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation
- Ev-SegNet: Semantic segmentation for event-based cameras. In CVPR Workshops, 2019.
- End-to-end object detection with transformers. In ECCV, 2020.
- Cross-attention of disentangled modalities for 3D human mesh recovery with transformers. In ECCV, 2022.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
- ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- Learning from images: A distillation learning framework for event cameras. IEEE TIP, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Guided Event Filtering: Synergy between intensity images and neuromorphic events for high performance imaging. IEEE TPAMI, 2021.
- Gunnar Farnebäck. Two-frame motion estimation based on polynomial expansion. In Image Analysis, 2003.
- Event-based vision: A survey. IEEE TPAMI, 44(1):154–180, 2022.
- Image style transfer using convolutional neural networks. In CVPR, 2016.
- Video to events: Recycling video datasets for event cameras. In CVPR, 2020.
- Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction. IRAL, 2021.
- DSEC: A stereo event camera dataset for driving scenarios. IRAL, 2021.
- EvIntSR-Net: Event guided multiple latent frames reconstruction and super-resolution. In ICCV, 2021.
- Neuromorphic camera guided high dynamic range imaging. In CVPR, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- Event-aided direct sparse odometry. In CVPR, 2022.
- Learning to exploit multiple vision modalities by using grafted networks. In ECCV, 2020.
- EvHandPose: Event-based 3D hand pose estimation with sparse supervision. In arXiv:2303.02862, 2023.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Panoptic segmentation. In CVPR, 2019.
- A 128×128 120 dB 15 μ𝜇{\mu}italic_μs latency asynchronous temporal contrast vision sensor. JSSC, 2008.
- Microsoft COCO: common objects in context. In ECCV, 2014.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Bridging the gap between events and frames through unsupervised domain adaptation. IRAL, 2022.
- Multi-bracket high dynamic range imaging with event cameras. In CVPR, 2022.
- Event-based moving object detection and tracking. In IROS, 2018.
- Learning visual motion segmentation using event surfaces. In CVPR, 2020.
- Learning to super resolve intensity images from events. In CVPR, 2020.
- Bringing a blurry frame alive at high frame-rate with an event camera. In CVPR, 2019.
- PyTorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Learning to detect objects with a 1 megapixel event camera. In NeurIPS, 2020.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- High speed and high dynamic range video with an event camera. IEEE TPAMI, 2021.
- EventHands: Real-time neural 3D hand pose estimation from an event stream. In ICCV, 2021.
- A 640×\times×480 dynamic vision sensor with a 9μ𝜇\muitalic_μm pixel and 300Meps address-event representation. In ISSCC, pages 66–67, 2017.
- Event-based motion segmentation by motion compensation. In ICCV, 2019.
- Event-based fusion for motion deblurring with cross-modal attention. In ECCV, 2022.
- ESS: learning event-based semantic segmentation from still images. In ECCV, 2022.
- Front and back illuminated dynamic and active pixel vision sensors comparison. IEEE TCAS-II, 65(5):677–681, 2018.
- NEST: neural event stack for event-based image enhancement. In ECCV, 2022.
- Fusing event-based and RGB camera for robust object detection in adverse conditions. In ICRA, 2022.
- Time Lens++: Event-based frame interpolation with parametric non-linear flow and multi-scale fusion. In CVPR, 2022.
- Time Lens: Event-based video frame interpolation. In CVPR, 2021.
- Attention is all you need. In NeurIPS, 2017.
- Dual transfer learning for event-based end-task prediction via pluggable event to image translation. In ICCV, 2021.
- EvDistill: Asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation. In CVPR, 2021.
- Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE TPAMI, 2022.
- Joint filtering of intensity images and neuromorphic events for high-resolution noise-robust imaging. In CVPR, 2020.
- EventCap: Monocular 3D capture of high-speed human motions using an event camera. In CVPR, 2020.
- Rgb-event fusion for moving object detection in autonomous driving. ICRA, 2023.
- EventGAN: Leveraging large scale image datasets for event cameras. In ICCP, 2021.
- Ev-FlowNet: Self-supervised optical flow estimation for event-based cameras. In RSS, 2018.
- Unsupervised event-based learning of optical flow, depth, and egomotion. In CVPR, 2019.
- EventHPE: Event-based 3D human pose and shape estimation. In ICCV, 2021.