Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification (2404.07739v1)
Abstract: Indoor scenes are usually characterized by scattered objects and their relationships, which turns the indoor scene classification task into a challenging computer vision task. Despite the significant performance boost in classification tasks achieved in recent years, provided by the use of deep-learning-based methods, limitations such as inter-category ambiguity and intra-category variation have been holding back their performance. To overcome such issues, gathering semantic information has been shown to be a promising source of information towards a more complete and discriminative feature representation of indoor scenes. Therefore, the work described in this paper uses both semantic information, obtained from object detection, and semantic segmentation techniques. While object detection techniques provide the 2D location of objects allowing to obtain spatial distributions between objects, semantic segmentation techniques provide pixel-level information that allows to obtain, at a pixel-level, a spatial distribution and shape-related features of the segmentation categories. Hence, a novel approach that uses a semantic segmentation mask to provide Hu-moments-based segmentation categories' shape characterization, designated by Segmentation-based Hu-Moments Features (SHMFs), is proposed. Moreover, a three-main-branch network, designated by GOS$2$F$2$App, that exploits deep-learning-based global features, object-based features, and semantic segmentation-based features is also proposed. GOS$2$F$2$App was evaluated in two indoor scene benchmark datasets: SUN RGB-D and NYU Depth V2, where, to the best of our knowledge, state-of-the-art results were achieved on both datasets, which present evidences of the effectiveness of the proposed approach.
- L. Xie, F. Lee, L. Liu, K. Kotani, and Q. Chen, “Scene recognition: A comprehensive survey,” Pattern Recognition, vol. 102, p. 107205, 2020.
- A. Mosella-Montoro and J. Ruiz-Hidalgo, “2D–3D Geometric Fusion network using Multi-Neighbourhood Graph Convolution for RGB-D indoor scene classification,” Information Fusion, vol. 76, pp. 46–54, 2021.
- Y. Li, Z. Zhang, Y. Cheng, L. Wang, and T. Tan, “MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification,” Pattern Recognition, vol. 90, pp. 436–449, 2019.
- Y. Li, J. Zhang, Y. Cheng, K. Huang, and T. Tan, “DF2Net: Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification,” AAAI Conference on Artificial Intelligence, 2018.
- R. Pereira, L. Garrote, T. Barros, A. Lopes, and U. J. Nunes, “A Deep Learning-based Indoor Scene Classification Approach Enhanced with Inter-Object Distance Semantic Features,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
- A. López-Cifuentes, M. Escudero-Viñolo, J. Bescós, and A. García-Martín, “Semantic-aware scene recognition,” Pattern Recognition, vol. 102, 2020.
- A. Caglayan, N. Imamoglu, A. B. Can, and R. Nakamura, “When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition,” Computer Vision and Image Understanding, vol. 217, p. 103373, 2022.
- Z. Xiong, Y. Yuan, and Q. Wang, “RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning,” IEEE Access, vol. 7, 2019.
- R. Pereira, T. Barros, L. Garrote, A. Lopes, and U. J. Nunes, “A deep learning-based global and segmentation-based semantic feature fusion approach for indoor scene classification,” Pattern Recognition Letters, vol. 179, pp. 24–30, 2024.
- Z. Xiong, Y. Yuan, and Q. Wang, “ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition,” IEEE Transactions on Image Processing, vol. 30, pp. 2722–2733, 2021.
- Z. Cai and L. Shao, “RGB-D Scene Classification via Multi-modal Feature Learning,” Cognitive Computation, vol. 11, pp. 825–840, 2019.
- X. Song, S. Jiang, B. Wang, C. Chen, and G. Chen, “Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition,” IEEE Transactions on Image Processing, vol. 29, pp. 525–537, 2020.
- M.-K. Hu, “Visual Pattern Recognition by Moment Invariants,” IRE Transactions on Information Theory, vol. 8, no. 2, pp. 179–187, 1962.
- R. Benouini, I. Batioua, K. Zenkouar, A. Zahi, S. Najah, and H. Qjidaa, “Fractional-order orthogonal Chebyshev Moments and Moment Invariants for image representation and pattern recognition,” Pattern Recognition, vol. 86, pp. 332–343, 2019.
- J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arxiv.1804.02767, 2018.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” in European Conference on Computer Vision (ECCV), 2018.
- R. Pereira, N. Gonçalves, L. Garrote, T. Barros, A. Lopes, and U. J. Nunes, “Deep-Learning based Global and Semantic Feature Fusion for Indoor Scene Classification,” in IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), 2020.
- S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D scene understanding benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Segmentation and Support Inference from RGB-D Images,” in ECCV, 2012.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning Deep Features for Scene Recognition using Places Database,” in Advances in Neural Information Processing Systems, 2014.
- D. Du, L. Wang, Z. Li, and G. Wu, “Cross-Modal Pyramid Translation for RGB-D Scene Recognition,” International Journal of Computer Vision, vol. 129, pp. 2309–2327, 2021.
- Z. Xiong, Y. Yuan, and Q. Wang, “MSN: Modality separation networks for RGB-D scene recognition,” Neurocomputing, vol. 373, pp. 81–89, 2020.
- C. Laranjeira, A. Lacerda, and E. R. Nascimento, “On Modeling Context from Objects with a Long Short-Term Memory for Indoor Scene Recognition,” in Conference on Graphics, Patterns and Images (SIBGRAPI), 2019.
- G. Chen, X. Song, H. Zeng, and S. Jiang, “Scene Recognition With Prototype-Agnostic Scene Layout,” IEEE Transactions on Image Processing, vol. 29, pp. 5877–5888, 2020.
- Y. Yuan, Z. Xiong, and Q. Wang, “ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition,” in AAAI Conference on Artificial Intelligence, 2019.
- M. George, M. Dixit, G. Zogg, and N. Vasconcelos, “Semantic Clustering for Robust Fine-Grained Scene Recognition,” in European Conference on Computer Vision (ECCV), 2016.
- H. Seong, J. Hyun, and E. Kim, “FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition,” IEEE Access, vol. 8, pp. 82 066–82 077, 2020.
- X. Song, C. Chen, and S. Jiang, “RGB-D Scene Recognition with Object-to-Object Relation,” in ACM International Conference on Multimedia, 2017.
- L. Zhou, J. Cen, X. Wang, Z. Sun, T. L. Lam, and Y. Xu, “BORM: Bayesian Object Relation Model for Indoor Scene Recognition,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
- T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in ECCV, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- G. Huang, Z. Liu, L. v. d. Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021.
- S. Gupta, P. Arbeláez, and J. Malik, “Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
- W. Zhou, S. Lv, J. Lei, T. Luo, and L. Yu, “RFNet: Reverse Fusion Network With Attention Mechanism for RGB-D Indoor Scene Understanding,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–6, 2022.
- A. Ayub and A. R. Wagner, “Centroid based concept learning for RGB-D indoor scene classification,” in British Machine Vision Conference (BMVC), 2020.
- D. Seichter, S. B. Fischedick, M. Köhler, and H.-M. Groß, “Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments,” in International Joint Conference on Neural Networks (IJCNN), 2022.
- X. Song, S. Jiang, L. Herranz, and C. Chen, “Learning Effective RGB-D Representations for Scene Recognition,” IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 980–993, 2019.
- L. Zhou, Y. Zhou, X. Qi, J. Hu, T. L. Lam, and Y. Xu, “Attentional Graph Convolutional Network for Structure-Aware Audiovisual Scene Classification,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–15, 2023.
- Ricardo Pereira (4 papers)
- Luís Garrote (5 papers)
- Tiago Barros (13 papers)
- Ana Lopes (3 papers)
- Urbano J. Nunes (8 papers)