Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ALF: Adaptive Label Finetuning for Scene Graph Generation (2312.17425v3)

Published 29 Dec 2023 in cs.CV and cs.AI

Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine-grained ones across the entire dataset. However, these methods encounter two primary challenges: 1) They overlook the inherent context constraints imposed by subject-object pairs, leading to erroneous relations transfer. 2) Additional retraining process are required after the data transfer, which incurs substantial computational costs. To overcome these limitations, we introduce the first plug-and-play one-stage data transfer pipeline in SGG, termed Adaptive Label Finetuning (ALF), which eliminates the need for extra retraining sessions and meanwhile significantly enhance models' relation recognition capability across various SGG benchmark approaches. Specifically, ALF consists of two components: Adaptive Label Construction (ALC) and Adaptive Iterative Learning (AIL). By imposing Predicate-Context Constraints within relation space, ALC adaptively re-ranks and selects candidate relations in reference to model's predictive logits utilizing the Restriction-Based Judgment techniques, achieving robust relation transfer. Supervised with labels transferred by ALC, AIL iteratively finetunes the SGG models in an auto-regressive manner, which mitigates the substantial computational costs arising from the retraining process. Extensive experiments demonstrate that ALF achieves a 16% improvement in mR@100 compared to the typical SGG method Motif, with only a 6% increase in calculation costs compared to the state-of-the-art method IETrans.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Entity slot filling for visual captioning,” IEEE Trans Circuits Syst Video Technol, 2021.
  2. “Auto-encoding and distilling scene graphs for image captioning,” TPAMI, 2020.
  3. “Fully convolutional scene graph generation,” in CVPR, 2021.
  4. “Attention-based relation reasoning network for video-text retrieval,” in ICME, 2021.
  5. “Point to rectangle matching for image text retrieval,” in ICMM, 2022.
  6. “Image-text embedding learning via visual and textual semantic reasoning,” TPAMI, 2022.
  7. “Discovering attractive segments in the user-generated video streams,” Information Processing & Management, 2020.
  8. “Mra-net: Improving vqa via multi-modal relation attention network,” TPAMI, 2020.
  9. “Beyond rnns: Positional self-attention with co-attention for video question answering,” in AAAI, 2019.
  10. “From pixels to objects: Cubic visual attention for visual question answering,” arXiv preprint arXiv:2206.01923, 2022.
  11. “Unbiased scene graph generation from biased training,” in CVPR, 2020.
  12. “Adaptive fine-grained predicates learning for scene graph generation,” TPAMI, 2023.
  13. “Informative scene graph generation via debiasing,” arXiv preprint arXiv:2308.05286, 2023.
  14. “Generalized unbiased scene graph generation,” arXiv preprint arXiv:2308.04802, 2023.
  15. “Multi-scale graph attention network for scene graph generation,” in ICME, 2022.
  16. “Dynamic scene graph generation via temporal prior inference,” in ACM MM, 2022.
  17. “Bipartite graph network with adaptive message passing for unbiased scene graph generation,” in CVPR, 2021.
  18. “Fine-grained scene graph generation with data transfer,” in ECCV, 2022.
  19. “Panoptic scene graph generation with semantics-prototype learning,” arXiv preprint arXiv:2307.15567, 2023.
  20. “The devil is in the labels: Noisy label correction for robust scene graph generation,” in CVPR, 2022.
  21. “Not all relations are equal: Mining informative labels for scene graph generation,” in CVPR, 2022.
  22. “Prototype-based embedding network for scene graph generation,” in CVPR, 2023.
  23. “Fine-grained predicates learning for scene graph generation,” in CVPR, 2022.
  24. “Neural motifs: Scene graph parsing with global context,” in CVPR, 2018.
  25. “Learning to compose dynamic tree structures for visual contexts,” in CVPR, 2019.
  26. “Attention is all you need,” NIPS, 2017.

Summary

We haven't generated a summary for this paper yet.