Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation (2404.07788v1)

Published 11 Apr 2024 in cs.CV and cs.AI

Abstract: Scene graph generation (SGG) aims to understand the visual objects and their semantic relationships from one given image. Until now, lots of SGG datasets with the eyelevel view are released but the SGG dataset with the overhead view is scarcely studied. By contrast to the object occlusion problem in the eyelevel view, which impedes the SGG, the overhead view provides a new perspective that helps to promote the SGG by providing a clear perception of the spatial relationships of objects in the ground scene. To fill in the gap of the overhead view dataset, this paper constructs and releases an aerial image urban scene graph generation (AUG) dataset. Images from the AUG dataset are captured with the low-attitude overhead view. In the AUG dataset, 25,594 objects, 16,970 relationships, and 27,175 attributes are manually annotated. To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG). Different from the traditional graph convolutional network, which has the natural advantage of capturing the global context for SGG, the convolutional layer in the LPG integrates the non-destructive initial features of the objects with dynamically updated neighborhood information to preserve the local context under the premise of mining the global context. To address the problem that there exists an extra-large number of potential object relationship pairs but only a small part of them is meaningful in AUG, we propose the adaptive bounding box scaling factor for potential relationship detection (ABS-PRD) to intelligently prune the meaningless relationship pairs. Extensive experiments on the AUG dataset show that our LPG can significantly outperform the state-of-the-art methods and the effectiveness of the proposed locality-preserving strategy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei, “Image retrieval using scene graphs,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3668–3678.
  2. S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “VQA: Visual Question Answering,” in International Conference on Computer Vision (ICCV), 2015.
  3. X. Chang, P. Ren, P. Xu, Z. Li, X. Chen, and A. Hauptmann, “A comprehensive survey of scene graphs: Generation and application,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1–26, 2023.
  4. Y. Zhan, J. Yu, T. Yu, and D. Tao, “On exploring undetermined relationships for visual relationship detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5128–5137.
  5. Y. Zhan, J. Yu, T. Yu, and D. Tao, “Multi-task compositional network for visual relationship detection,” International Journal of Computer Vision, vol. 128, pp. 2146–2165, 2020.
  6. H. Zhou, J. Zhang, T. Luo, Y. Yang, and J. Lei, “Debiased scene graph generation for dual imbalance learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4274–4288, 2023.
  7. J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7778–7796, 2022.
  8. A. McCallum and W. Li, “Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons,” in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, 2003, pp. 188–191.
  9. H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua, “Visual translation embedding network for visual relation detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE Computer Society, 2017, pp. 3107–3115.
  10. Z.-S. Hung, A. Mallya, and S. Lazebnik, “Contextual translation embedding for visual relationship detection and scene graph generation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 3820–3832, 2020.
  11. R. Zellers, M. Yatskar, S. Thomson, and Y. Choi, “Neural motifs: Scene graph parsing with global context,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 5831–5840.
  12. G. Yin, L. Sheng, B. Liu, N. Yu, X. Wang, J. Shao, and C. C. Loy, “Zoom-net: Mining deep feature interactions for visual relationship recognition,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 322–338.
  13. J. Jiang, Z. He, S. Zhang, X. Zhao, and J. Tan, “Learning to transfer focus of graph neural network for scene graph parsing,” Pattern Recognition, vol. 112, p. 107707, 2021.
  14. K. Tang, H. Zhang, B. Wu, W. Luo, and W. Liu, “Learning to compose dynamic tree structures for visual contexts,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6612–6621.
  15. K. Tang, Y. Niu, J. Huang, J. Shi, and H. Zhang, “Unbiased scene graph generation from biased training,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3713–3722.
  16. H. Liu, N. Yan, M. Mortazavi, and B. Bhanu, “Fully convolutional scene graph generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 546–11 556.
  17. M. Suhail, A. Mittal, B. Siddiquie, C. Broaddus, J. Eledath, G. Medioni, and L. Sigal, “Energy-based learning for scene graph generation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 931–13 940.
  18. J. Zhang, Y. Zhang, B. Wu, Y. Fan, F. Shen, and H. T. Shen, “Dual resgcn for balanced scene graphgeneration,” arXiv e-prints, pp. arXiv–2011, 2020.
  19. M. Khademi and O. Schulte, “Deep generative probabilistic graph neural networks for scene graph generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 237–11 245.
  20. Y. Cong, W. Liao, H. Ackermann, B. Rosenhahn, and M. Y. Yang, “Spatial-temporal transformer for dynamic scene graph generation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 16 372–16 382.
  21. X. Lin, C. Ding, Y. Zhan, Z. Li, and D. Tao, “Hl-net: Heterophily learning network for scene graph generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19 476–19 485.
  22. X. Lin, C. Ding, J. Zeng, and D. Tao, “Gps-net: Graph property sensing network for scene graph generation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3743–3752.
  23. R. Li, S. Zhang, and X. He, “Sgtr: End-to-end scene graph generation with transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 19 486–19 496.
  24. S. Shit, R. Koner, B. Wittmann, J. Paetzold, I. Ezhov, H. Li, J. Pan, S. Sharifzadeh, G. Kaissis, V. Tresp et al., “Relationformer: A unified framework for image-to-graph generation,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII.   Springer, 2022, pp. 422–439.
  25. R. Krishna, Y. Zhu, O. Groth, J. Johnson, and F. F. Li, “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” International Journal of Computer Vision, vol. 123, no. 1, 2017.
  26. D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei, “Scene graph generation by iterative message passing,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3097–3106.
  27. A. Farhadi and A. Sadeghi, “Recognition using visual phrases,” in Computer Vision and Pattern Recognition (CVPR), 2011.
  28. H. Qi, Y. Xu, T. Yuan, T. Wu, and S.-C. Zhu, “Scene-centric joint parsing of cross-view videos,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  29. I. Armeni, Z.-Y. He, A. Zamir, J. Gwak, J. Malik, M. Fischer, and S. Savarese, “3d scene graph: A structure for unified semantics, 3d space, and camera,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5663–5672.
  30. Y. Liang, Y. Bai, W. Zhang, X. Qian, L. Zhu, and T. Mei, “Vrr-vg: Refocusing visually-relevant relationships,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 10 402–10 411.
  31. T. Zhuo, Z. Cheng, P. Zhang, Y. Wong, and M. Kankanhalli, “Explainable video action reasoning via prior knowledge and state transitions,” in Proceedings of the 27th ACM International Conference on Multimedia, ser. MM ’19.   New York, NY, USA: Association for Computing Machinery, 2019, p. 521–529.
  32. J. Ji, R. Krishna, L. Fei-Fei, and J. C. Niebles, “Action genome: Actions as compositions of spatio-temporal scene graphs,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10 233–10 244.
  33. J. Wald, H. Dhamo, N. Navab, and F. Tombari, “Learning 3d semantic scene graphs from 3d indoor reconstructions,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3960–3969.
  34. L. Gao, J.-M. Sun, K. Mo, Y.-K. Lai, L. J. Guibas, and J. Yang, “Scenehgn: Hierarchical graph networks for 3d indoor scene generation with fine-grained geometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–18, 2023.
  35. J. Chen, X. Zhou, Y. Zhang, G. Sun, M. Deng, and H. Li, “Message-passing-driven triplet representation for geo-object relational inference in hrsi,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022.
  36. P. Li, D. Zhang, A. Wulamu, X. Liu, and P. Chen, “Semantic Relation Model and Dataset for Remote Sensing Scene Understanding,” ISPRS International Journal of Geo-Information, vol. 10, no. 7, p. 488, Jul. 2021.
  37. J. Li, Y. Wong, Q. Zhao, and M. S. Kankanhalli, “Visual social relationship recognition,” Int. J. Comput. Vision, vol. 128, no. 6, p. 1750–1764, jun 2020.
  38. S. Kumar, S. Atreja, A. Singh, and M. Jain, “Adversarial adaptation of scene graph models for understanding civic issues,” in The World Wide Web Conference, 2019, pp. 2943–2949.
  39. M.-J. Chiou, H. Ding, H. Yan, C. Wang, R. Zimmermann, and J. Feng, “Recovering the unbiased scene graphs from the biased ones,” in Proceedings of the 29th ACM International Conference on Multimedia, ser. MM ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 1581–1590.
  40. A. Zhang, Y. Yao, Q. Chen, W. Ji, Z. Liu, M. Sun, and T.-S. Chua, “Fine-grained scene graph generation with data transfer,” in ECCV, 2022.
  41. H. Kang and C. D. Yoo, “Skew class-balanced re-weighting for unbiased scene graph generation,” arXiv preprint arXiv:2301.00351, 2023.
  42. S. Yan, C. Shen, Z. Jin, J. Huang, R. Jiang, Y. Chen, and X.-S. Hua, “Pcpl: Predicate-correlation perception learning for unbiased scene graph generation,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 265–273.
  43. C. Zheng, L. Gao, X. Lyu, P. Zeng, A. E. Saddik, and H. T. Shen, “Dual-branch hybrid learning network for unbiased scene graph generation,” arXiv preprint arXiv:2207.07913, 2022.
  44. R. Li, S. Zhang, B. Wan, and X. He, “Bipartite graph network with adaptive message passing for unbiased scene graph generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 109–11 119.
  45. C. Chen, Y. Zhan, B. Yu, L. Liu, Y. Luo, and B. Du, “Resistance training using prior bias: toward unbiased scene graph generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 212–220.
  46. X. Chang, T. Wang, S. Cai, and C. Sun, “Landmark: Language-guided representation enhancement framework for scene graph generation,” arXiv preprint arXiv:2303.01080, 2023.
  47. R. Yu, A. Li, V. I. Morariu, and L. S. Davis, “Visual relationship detection with internal and external linguistic knowledge distillation,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1974–1982.
  48. R. Krishna, I. Chami, M. Bernstein, and L. Fei-Fei, “Referring relationships,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6867–6876.
  49. B. Dai, Y. Zhang, and D. Lin, “Detecting visual relationships with deep relational networks,” in Proceedings of the IEEE conference on computer vision and Pattern recognition, 2017, pp. 3076–3086.
  50. X. Wang and A. Gupta, “Videos as space-time region graphs,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 399–417.
  51. S. Sharifzadeh, S. M. Baharlou, and V. Tresp, “Classification by attention: Scene graph classification with prior knowledge,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5025–5033.
  52. R. Wang, Z. Wei, P. Li, Q. Zhang, and X. Huang, “Storytelling from an image stream using scene graphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 9185–9192.
  53. G. Adaimi, D. Mizrahi, and A. Alahi, “Composite relationship fields with transformers for scene graph generation,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 52–64.
  54. X. Han, X. Dong, X. Song, T. Gan, Y. Zhan, Y. Yan, and L. Nie, “Divide-and-conquer predictor for unbiased scene graph generation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 12, pp. 8611–8622, 2022.
  55. B. Zhao, Z. Mao, S. Fang, W. Zang, and Y. Zhang, “Semantically similarity-wise dual-branch network for scene graph generation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4573–4583, 2022.
  56. M. J. Khan, J. G. Breslin, and E. Curry, “Expressive scene graph generation using commonsense knowledge infusion for visual understanding and reasoning,” in The Semantic Web: 19th International Conference, ESWC 2022, Hersonissos, Crete, Greece, May 29–June 2, 2022, Proceedings.   Springer, 2022, pp. 93–112.
  57. Y. Cong, M. Y. Yang, and B. Rosenhahn, “Reltr: Relation transformer for scene graph generation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–16, 2023.
  58. G. Yang, J. Zhang, Y. Zhang, B. Wu, and Y. Yang, “Probabilistic modeling of semantic ambiguity for scene graph generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 527–12 536.
  59. Z. Lin, F. Zhu, Q. Wang, Y. Kong, J. Wang, L. Huang, and Y. Hao, “Rssgg_cs: Remote sensing image scene graph generation by fusing contextual information and statistical knowledge,” Remote. Sens., vol. 14, p. 3118, 2022.
  60. Z. Lin, F. Zhu, Y. Kong, Q. Wang, and J. Wang, “Srsg and s2sg: A model and a dataset for scene graph generation of remote sensing images from segmentation results,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
  61. C. Mostegel, M. Maurer, N. Heran, J. P. Puerta, and F. Fraundorfer, “Semantic drone dataset,” Website, 2019.
  62. C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, “Visual relationship detection with language priors,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14.   Springer, 2016, pp. 852–869.
  63. B. Schroeder and S. Tripathi, “Structured query-based image retrieval using scene graphs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 178–179.
  64. B. Zhuang, Q. Wu, C. Shen, I. Reid, and A. van den Hengel, “Hcvrd: A benchmark for large-scale human-centered visual relationship detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  65. D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei, “Scene graph generation by iterative message passing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5410–5419.
  66. J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. Shamma, M. Bernstein, and L. Fei-Fei, “Image retrieval using scene graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3668–3678.
  67. J. Yang, Y. Z. Ang, Z. Guo, K. Zhou, W. Zhang, and Z. Liu, “Panoptic scene graph generation,” in ECCV, 2022, pp. 178–196.
  68. Z.-S. Hung, A. Mallya, and S. Lazebnik, “Contextual translation embedding for visual relationship detection and scene graph generation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 3820–3832, 2021.
  69. Z. Cai and N. Vasconcelos, “Cascade r-cnn: Delving into high quality object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6154–6162.
  70. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
  71. K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “Mmdetection: Open mmlab detection toolbox and benchmark,” CoRR, vol. abs/1906.07155, 2019.
  72. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yansheng Li (24 papers)
  2. Kun Li (193 papers)
  3. Yongjun Zhang (59 papers)
  4. Linlin Wang (35 papers)
  5. Dingwen Zhang (62 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.