Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow (2403.07432v1)

Published 12 Mar 2024 in cs.CV

Abstract: Single RGB or LiDAR is the mainstream sensor for the challenging scene flow, which relies heavily on visual features to match motion features. Compared with single modality, existing methods adopt a fusion strategy to directly fuse the cross-modal complementary knowledge in motion space. However, these direct fusion methods may suffer the modality gap due to the visual intrinsic heterogeneous nature between RGB and LiDAR, thus deteriorating motion features. We discover that event has the homogeneous nature with RGB and LiDAR in both visual and motion spaces. In this work, we bring the event as a bridge between RGB and LiDAR, and propose a novel hierarchical visual-motion fusion framework for scene flow, which explores a homogeneous space to fuse the cross-modal complementary knowledge for physical interpretation. In visual fusion, we discover that event has a complementarity (relative v.s. absolute) in luminance space with RGB for high dynamic imaging, and has a complementarity (local boundary v.s. global shape) in scene structure space with LiDAR for structure integrity. In motion fusion, we figure out that RGB, event and LiDAR are complementary (spatial-dense, temporal-dense v.s. spatiotemporal-sparse) to each other in correlation space, which motivates us to fuse their motion correlations for motion continuity. The proposed hierarchical fusion can explicitly fuse the multimodal knowledge to progressively improve scene flow from visual space to motion space. Extensive experiments have been performed to verify the superiority of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Pillarflownet: A real-time deep multitask network for lidar-based 3d object detection and scene flow estimation. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10734–10741. IEEE, 2020.
  2. Slim: Self-supervised lidar scene flow and motion segmentation. In Int. Conf. Comput. Vis., pages 13126–13136, 2021.
  3. Bounding boxes, segmentations and object coordinates: How important is recognition for 3d scene flow estimation in autonomous driving scenarios? In Int. Conf. Comput. Vis., pages 2574–2583, 2017.
  4. Deep rigid instance scene flow. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3614–3622, 2019.
  5. Self-supervised multi-frame monocular scene flow. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2684–2694, 2021.
  6. Raft-3d: Scene flow using rigid-motion embeddings. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8375–8384, 2021.
  7. Pv-raft: Point-voxel correlation fields for scene flow estimation of point clouds. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6954–6963, 2021.
  8. Deeplidarflow: A deep learning architecture for scene flow estimation using monocular camera and sparse lidar. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10460–10467. IEEE, 2020.
  9. Camliflow: Bidirectional camera-lidar fusion for joint optical flow and scene flow estimation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5791–5801, 2022.
  10. Event-based vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44:154–180, 2019.
  11. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2462–2470, 2017.
  12. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12240–12249, 2019.
  13. Flownet3d: Learning scene flow in 3d point clouds. In IEEE Conf. Comput. Vis. Pattern Recog., pages 529–537, 2019.
  14. Raft: Recurrent all-pairs field transforms for optical flow. In Eur. Conf. Comput. Vis., pages 402–419, 2020.
  15. Learning to estimate hidden motions with global motion aggregation. In Int. Conf. Comput. Vis., pages 9772–9781, 2021.
  16. Mono-sf: Multi-view geometry meets single-view depth for monocular scene flow estimation of dynamic traffic scenes. In Int. Conf. Comput. Vis., pages 2780–2790, 2019.
  17. Learning optical flow, depth, and scene flow without real-world labels. IEEE Robotics and Automation Letters, 7(2):3491–3498, 2022.
  18. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8934–8943, 2018.
  19. Self-supervised 3d scene flow estimation guided by superpoints. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5271–5280, 2023.
  20. Scoop: Self-supervised correspondence and optimization-based scene flow. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5281–5290, 2023.
  21. Hidden gems: 4d radar scene flow learning using cross-modal supervision. In IEEE Conf. Comput. Vis. Pattern Recog., pages 9340–9349, 2023.
  22. Rpeflow: Multimodal fusion of rgb-pointcloud-event for joint optical flow and scene flow estimation. In Int. Conf. Comput. Vis., pages 10030–10040, 2023.
  23. Attention-based multimodal fusion for video description. In Int. Conf. Comput. Vis., pages 4193–4202, 2017.
  24. Multi-modality cross attention network for image and sentence matching. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10941–10950, 2020.
  25. Event-based fusion for motion deblurring with cross-modal attention. In Eur. Conf. Comput. Vis., pages 412–428. Springer, 2022.
  26. Objectfusion: Multi-modal 3d object detection with object-centric fusion. In Int. Conf. Comput. Vis., pages 18067–18076, 2023.
  27. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, 2013.
  28. Learning discrete representations via information maximizing self-augmented training. In Int. Conf. on Machine Learning, pages 1558–1567. PMLR, 2017.
  29. Rgb-d saliency detection via cascaded mutual information minimization. In Int. Conf. Comput. Vis., pages 4338–4347, 2021.
  30. See more and know more: Zero-shot point cloud segmentation via multi-modal visual data. In Int. Conf. Comput. Vis., pages 21674–21684, 2023.
  31. Attention is all you need. Adv. Neural Inform. Process. Syst., 30, 2017.
  32. Hybrid high dynamic range imaging fusing neuromorphic and conventional images. 45(7):8553–8565, 2023.
  33. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Int. Conf. Comput. Vis., pages 2223–2232, 2017.
  34. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  35. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In Eur. Conf. Comput. Vis., pages 3–10, 2016.
  36. Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In Eur. Conf. Comput. Vis., pages 1–18. Springer, 2018.
  37. Object scene flow for autonomous vehicles. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3061–3070, 2015.
  38. v2e: From video frames to realistic dvs events. In IEEE Conf. Comput. Vis. Pattern Recog. Worksh., pages 1312–1321, 2021.
  39. Optical flow in the dark. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6749–6757, 2020.
  40. DSEC: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 2021.
  41. Smurf: Self-teaching multi-frame unsupervised raft with full-image warping. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3887–3896, 2021.
  42. Flowformer: A transformer architecture for optical flow. In Eur. Conf. Comput. Vis., pages 668–685. Springer, 2022.
  43. Kindling the darkness: A practical low-light image enhancer. In ACM Int. Conf. Multimedia, pages 1632–1640, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hanyu Zhou (19 papers)
  2. Yi Chang (150 papers)
  3. Zhiwei Shi (21 papers)
  4. Luxin Yan (33 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.