Adaptive Self-training Framework for Fine-grained Scene Graph Generation (2401.09786v5)
Abstract: Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.
- Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2020. doi: 10.1109/IJCNN48605.2020.9207304.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- Knowledge-embedded routing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6163–6171, 2019.
- Recovering the unbiased scene graphs from the biased ones. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 1581–1590, 2021.
- Learning of visual relations: The devil is in the tails. In ICCV, pp. 15404–15413, October 2021.
- Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In CVPR, 2022.
- Class-imbalanced semi-supervised learning with adaptive thresholding. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 8082–8094. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/guo22e.html.
- From general to specific: Informative scene graph generation via balance adjustment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16383–16392, October 2021.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
- Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123:32–73, 2017.
- The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision, 128(7):1956–1981, 2020.
- Sar: Self-adaptive refinement on pseudo labels for multiclass-imbalanced semi-supervised learning. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4090–4099, 2022. doi: 10.1109/CVPRW56347.2022.00454.
- Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, pp. 896, 2013.
- The devil is in the labels: Noisy label correction for robust scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18869–18878, 2022.
- Bipartite graph network with adaptive message passing for unbiased scene graph generation. CVPR, 2021.
- Microsoft coco: Common objects in context, 2014. URL http://arxiv.org/abs/1405.0312. cite arxiv:1405.0312Comment: 1) updated annotation pipeline description and figures; 2) added new section describing datasets splits; 3) updated author list.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
- Gps-net: Graph property sensing network for scene graph generation. CVPR, pp. 3746–3753, 2020.
- Visual relationship detection with language priors. European conference on computer vision, pp. 852–869, 2016.
- Fine-grained predicates learning for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19467–19475, June 2022.
- The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Relationformer: A unified framework for image-to-graph generation. In ECCV’2022, 2022.
- Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020)., 2020.
- Panayiotis Pintelas Sotiris Kotsiantis, Dimitris Kanellopoulos. Handling imbalanced datasets: A review. pp. 25–36, 2006.
- Learning to compose dynamic tree structures for visual contexts. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6619–6628, 2019.
- Unbiased scene graph generation from biased training. CVPR, pp. 3716–3725, 2020.
- Freematch: Self-adaptive thresholding for semi-supervised learning. arXiv preprint arXiv:2205.07246, 2022.
- Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10857–10866, June 2021.
- Unsupervised data augmentation for consistency training. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 6256–6268. Curran Associates, Inc., 2020a. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/44feb0096faa8326192570788b38c1d1-Paper.pdf.
- Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10687–10698, 2020b.
- Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
- Scene graph generation by iterative message passing. CVPR, 2017.
- Dash: Semi-supervised learning with dynamic thresholding. In Proceedings of the 38 th International Conference on Machine Learning, 2021.
- Pcpl: Predicate-correlation perception learning for unbiased scene graph generation. In Proceedings of the 28th ACM International Conference on Multimedia, pp. 265–273, 2020.
- Graph r-cnn for scene graph generation. ECCV, 128(7):670–685, 2018.
- Unbiased heterogeneous scene graph generation with relation-aware message passing neural network. 2023.
- Bridging knowledge graphs to generate scene graphs. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 606–623. Springer, 2020.
- Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5831–5840, 2018.
- Fine-grained scene graph generation with data transfer. In ECCV, 2022.
- Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. In Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/pdf?id=3qMwV98zLIk.
- Solving missing-annotation object detection with background recalibration loss. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020. doi: 10.1109/ICASSP40776.2020.9053738.
- Graphical contrastive losses for scene graph parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11535–11543, 2019.
- Prototype-based embedding network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22783–22792, June 2023.
- Kibum Kim (16 papers)
- Kanghoon Yoon (16 papers)
- Yeonjun In (17 papers)
- Jinyoung Moon (13 papers)
- Donghyun Kim (129 papers)
- Chanyoung Park (83 papers)