HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation (2403.16788v1)
Abstract: Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and learning from noisy pseudo labels, especially when generated from a single source, may reinforce the errors. This drawback is also called confirmation bias in pseudo-labeling. In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels. In particular, we first employ a plain unsupervised domain adaptation framework as our baseline, which can generate a set of pseudo labels through self-training. Then, we incorporate offline event-to-image reconstruction into the framework, and obtain another set of pseudo labels by predicting segmentation maps on the reconstructed images. A noisy label learning strategy is designed to mix the two sets of pseudo labels and enhance the quality. Moreover, we propose a soft prototypical alignment module to further improve the consistency of target domain features. Extensive experiments show that our proposed method outperforms existing state-of-the-art methods by a large margin on the DSEC-Semantic dataset (+5.88% accuracy, +10.32% mIoU), which even surpasses several supervised methods.
- Ev-segnet: Semantic segmentation for event-based cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
- Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks, IJCNN 2020, Glasgow, United Kingdom, July 19-24, 2020, pages 1–8, 2020.
- Ddd17: End-to-end davis driving dataset. arXiv preprint arXiv:1711.01458, 2017.
- Halsie–hybrid approach to learning segmentation by simultaneously exploiting image and event modalities. arXiv preprint arXiv:2211.10754, 2022.
- Exploiting domain-specific features to enhance domain generalization. Advances in Neural Information Processing Systems, 34:21189–21201, 2021.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
- Generalized jensen-shannon divergence loss for learning with noisy labels. Advances in Neural Information Processing Systems, 34:30284–30297, 2021.
- Event-based, 6-dof camera tracking from photometric depth maps. IEEE transactions on pattern analysis and machine intelligence, 40(10):2402–2412, 2017.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Video to events: Recycling video datasets for event cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3586–3595, 2020.
- Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 6(3):4947–4954, 2021.
- Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In International conference on machine learning, pages 222–230. PMLR, 2013.
- Unsupervised domain adaptation with label and structural consistency. IEEE Transactions on Image Processing, 25(12):5552–5562, 2016.
- Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9924–9935, 2022.
- Towards a more efficient few-shot learning-based human gesture recognition via dynamic vision sensors. In BMVC, page 938, 2022.
- X4d-sceneformer: Enhanced scene understanding on 4d point cloud videos through cross-modal knowledge transfer. arXiv preprint arXiv:2312.07378, 2023.
- Attribute-aligned domain-invariant feature learning for unsupervised domain adaptation person re-identification. IEEE Transactions on Information Forensics and Security, 16:1480–1494, 2020.
- Targan: Generating target data with class labels for unsupervised domain adaptation. Knowledge-Based Systems, 172:123–129, 2019.
- Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5419–5427, 2018.
- Bridging the gap between events and frames through unsupervised domain adaptation. IEEE Robotics and Automation Letters, 7(2):3515–3522, 2022.
- Fixbi: Bridging domain spaces for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1094–1103, 2021.
- Classmix: Segmentation-based data augmentation for semi-supervised learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1369–1378, 2021.
- Hfirst: A temporal approach to object recognition. IEEE transactions on pattern analysis and machine intelligence, 37(10):2028–2040, 2015.
- Transferrable prototypical networks for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2239–2247, 2019.
- High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence, 43(6):1964–1980, 2019.
- High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1964–1980, 2021.
- Ess: Learning event-based semantic segmentation from still images. In European Conference on Computer Vision, pages 341–357. Springer, 2022.
- Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821, 2020.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
- Understanding gradual domain adaptation: Improved analysis, optimal path and beyond. In International Conference on Machine Learning, pages 22784–22801. PMLR, 2022.
- Dual transfer learning for event-based end-task prediction via pluggable event to image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2135–2145, 2021a.
- Evdistill: Asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 608–619, 2021b.
- Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 45(7):9004–9021, 2023.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
- Semi-supervised domain adaptation with source label adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24100–24109, 2023.
- Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12414–12424, 2021.
- Unsupervised event-based learning of optical flow, depth, and egomotion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 989–997, 2019.
- Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European conference on computer vision (ECCV), pages 289–305, 2018.
- Linglin Jing (5 papers)
- Yiming Ding (20 papers)
- Yunpeng Gao (4 papers)
- Zhigang Wang (107 papers)
- Xu Yan (130 papers)
- Dong Wang (628 papers)
- Gerald Schaefer (16 papers)
- Hui Fang (48 papers)
- Bin Zhao (107 papers)
- Xuelong Li (268 papers)