Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation (2403.14886v1)

Published 21 Mar 2024 in cs.CV

Abstract: Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image, which is challenging due to incomplete labelling, long-tailed relationship categories, and relational semantic overlap. Existing Transformer-based methods either employ distinct queries for objects and predicates or utilize holistic queries for relation triplets and hence often suffer from limited capacity in learning low-frequency relationships. In this paper, we present a new Transformer-based method, called DSGG, that views scene graph detection as a direct graph prediction problem based on a unique set of graph-aware queries. In particular, each graph-aware query encodes a compact representation of both the node and all of its relations in the graph, acquired through the utilization of a relaxed sub-graph matching during the training process. Moreover, to address the problem of relational semantic overlap, we utilize a strategy for relation distillation, aiming to efficiently learn multiple instances of semantic relationships. Extensive experiments on the VG and the PSG datasets show that our model achieves state-of-the-art results, showing a significant improvement of 3.5\% and 6.7\% in mR@50 and mR@100 for the scene-graph generation task and achieves an even more substantial improvement of 8.5\% and 10.3\% in mR@50 and mR@100 for the panoptic scene graph generation task. Code is available at \url{https://github.com/zeeshanhayder/DSGG}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Composite relationship fields with transformers for scene graph generation. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 52–64, 2023.
  2. Kurt Anstreicher. Recent advances in the solution of quadratic assignment problems. Math. Program., 97:27–42, 2003.
  3. End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV), 2020.
  4. Per-pixel classification is not all you need for semantic segmentation. In NeurIPS, 2021.
  5. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
  6. Reltr: Relation transformer for scene graph generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(09):11169–11183, 2023.
  7. Learning of visual relations: The devil is in the tails. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15384–15393, 2021.
  8. Single-stage visual relationship learning using conditional queries. In Advances in Neural Information Processing Systems, 2022.
  9. Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  10. Harold W Kuhn. The hungarian method for the assignment problem. The Journal of Naval Research Logistics (NRL), 1955.
  11. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022a.
  12. The devil is in the labels: Noisy label correction for robust scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022b.
  13. Bipartite graph network with adaptive message passing for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  14. Sgtr: End-to-end scene graph generation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19486–19496, 2022c.
  15. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV), 2014.
  16. Gps-net: Graph property sensing network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3746–3753, 2020.
  17. Hl-net: Heterophily learning network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022a.
  18. Ru-net: Regularized unrolling network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022b.
  19. Repsgg: Novel representations of entities and relationships for scene graph generation, 2023.
  20. Fully convolutional scene graph generation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11541–11551, 2021a.
  21. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021b.
  22. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations, 2018.
  23. Long-tail learning via logit adjustment. CoRR, abs/2007.07314, 2020.
  24. Pixels to graphs by associative embedding. Advances in Neural Information Processing Systems, 2017-December:2172–2181, 2017. 31st Annual Conference on Neural Information Processing Systems, NIPS 2017 ; Conference date: 04-12-2017 Through 09-12-2017.
  25. Faster R-CNN: Towards real-time object detection with region proposal networks. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.
  26. Relationformer: A unified framework for image-to-graph generation. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, page 422–439, Berlin, Heidelberg, 2022. Springer-Verlag.
  27. Kaihua Tang. A scene graph generation codebase in pytorch, 2020. https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch.
  28. Learning to compose dynamic tree structures for visual contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  29. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  30. Structured sparse r-cnn for direct scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  31. Scene graph generation by iterative message passing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  32. Panoptic scene graph generation. In European Conference on Computer Vision, pages 178–196. Springer, 2022.
  33. Unbiased hetrogeneous scene graph generation with relation-aware message passing neural network. arXiv.2212.00443, 2022.
  34. Neural motifs: Scene graph parsing with global context. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  35. Efficient two-stage detection of human-object interactions with a novel unary-pairwise transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20104–20112, 2022.
  36. Prototype-based embedding network for scene graph generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  37. Hilo: Exploiting high low frequency relations for unbiased panoptic scene graph generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21637–21648, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Zeeshan Hayder (20 papers)
  2. Xuming He (109 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.