Unleashing Network Potentials for Semantic Scene Completion (2403.07560v2)
Abstract: Semantic scene completion (SSC) aims to predict complete 3D voxel occupancy and semantics from a single-view RGB-D image, and recent SSC methods commonly adopt multi-modal inputs. However, our investigation reveals two limitations: ineffective feature learning from single modalities and overfitting to limited datasets. To address these issues, this paper proposes a novel SSC framework - Adversarial Modality Modulation Network (AMMNet) - with a fresh perspective of optimizing gradient updates. The proposed AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Specifically, the cross-modal modulation adaptively re-calibrates the features to better excite representation potentials from each single modality. The adversarial training employs a minimax game of evolving gradients, with customized guidance to strengthen the generator's perception of visual fidelity from both geometric completeness and semantic correctness. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin, providing a promising direction for improving the effectiveness and generalization of SSC methods.
- A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, pages 1–48, 2021.
- Semantic scene completion via integrating instances and scene in-the-loop. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 324–333, 2021.
- Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
- 3d sketch-aware semantic scene completion via semi-supervised structure prior. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4193–4202, 2020.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
- Cvsformer: Cross-view synthesis transformer for semantic scene completion. In IEEE International Conference on Computer Vision (ICCV), pages 8874–8883, 2023.
- Edgenet: Semantic scene completion from a single rgb-d image. In International conference on pattern recognition (ICPR), pages 503–510, 2021.
- Data augmented 3d semantic scene completion with 2d segmentation priors. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 3781–3790, 2022.
- Improving multi-modal learning with uni-modal teachers. arXiv preprint arXiv:2106.11059, 2021.
- A fast multi-output rbf neural network construction method. Neurocomputing, 73(10-12):2196–2202, 2010.
- Structured prediction of unobserved voxels from a single depth image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5431–5440, 2016.
- Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151, 2022.
- Two stream 3d semantic scene completion. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 0–0, 2019.
- Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia Tools and Applications, 79:12777–12815, 2020.
- View-volume network for semantic scene completion from a single depth image. In International Joint Conference on Artificial Intelligence (IJCAI), pages 726–732, 2018.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Temporal multimodal learning in audiovisual speech recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3574–3582, 2016.
- Squeeze-and-excitation networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7132–7141, 2018.
- Rgbd based dimensional decomposition residual network for 3d semantic scene completion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7693–7702, 2019.
- Anisotropic convolutional networks for 3d semantic scene completion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3351–3359, 2020.
- From front to rear: 3d semantic scene completion through planar convolution and attention-based network. IEEE Transactions on Multimedia, 25:8294–8307, 2023.
- See and think: Disentangling semantic scene completion. In Advances in Neural Information Processing Systems (NeurIPS), pages 263–274, 2018.
- 3d gated recurrent fusion for semantic scene completion. arXiv preprint arXiv:2002.07269, 2020.
- Sgdr: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations (ICLR), 2016.
- Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2018.
- Thinet: A filter level pruning method for deep neural network compression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5058–5066, 2017.
- Balanced multimodal learning via on-the-fly gradient modulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR, pages 8238–8247, 2022.
- Deep multimodal learning: A survey on recent advances and trends. IEEE signal processing magazine, 34(6):96–108, 2017.
- Semantic scene completion using local deep implicit functions on lidar data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7205–7218, 2021.
- Text data augmentation for deep learning. Journal of big Data, 8:1–34, 2021.
- Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision (ECCV), pages 746–760, 2012.
- Semantic scene completion from a single depth image. In IEEE International Conference on Computer Vision (ICCV), pages 1746–1754, 2017.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- Training very deep networks. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
- Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016.
- Not all voxels are equal: Semantic scene completion from the point-voxel perspective. In AAAI Conference on Artificial Intelligence (AAAI), pages 2352–2360, 2022.
- Semantic scene completion with cleaner self. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 867–877, 2023.
- Ffnet: Frequency fusion network for semantic scene completion. In AAAI Conference on Artificial Intelligence (AAAI), pages 2550–2557, 2022.
- Cbam: Convolutional block attention module. In European Conference on Computer Vision (ECCV), pages 3–19, 2018.
- Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in Neural Information Processing Systems (NeurIPS), pages 12077–12090, 2021.
- Cascaded context pyramid for full-resolution 3d semantic scene completion. In IEEE International Conference on Computer Vision (ICCV), pages 7801–7810, 2019.
- Semantic point completion network for 3d semantic scene completion. In European Conference on Artificial Intelligence (ECAI), pages 2824–2831, 2020.
- Neural architecture search with reinforcement learning. In International Conference on Learning Representations (ICLR), 2016.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.