A Comprehensive Survey of Scene Graphs: Generation and Application (2104.01111v5)

Published 17 Mar 2021 in cs.CV

Abstract: Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene. As computer vision technology continues to develop, people are no longer satisfied with simply detecting and recognizing objects in images; instead, people look forward to a higher level of understanding and reasoning about visual scenes. For example, given an image, we want to not only detect and recognize objects in the image, but also know the relationship between objects (visual relationship detection), and generate a text description (image captioning) based on the image content. Alternatively, we might want the machine to tell us what the little girl in the image is doing (Visual Question Answering (VQA)), or even remove the dog from the image and find similar images (image editing and retrieval), etc. These tasks require a higher level of understanding and reasoning for image vision tasks. The scene graph is just such a powerful tool for scene understanding. Therefore, scene graphs have attracted the attention of a large number of researchers, and related research is often cross-modal, complex, and rapidly developing. However, no relatively systematic survey of scene graphs exists at present. To this end, this survey conducts a comprehensive investigation of the current scene graph research. More specifically, we first summarized the general definition of the scene graph, then conducted a comprehensive and systematic discussion on the generation method of the scene graph (SGG) and the SGG with the aid of prior knowledge. We then investigated the main applications of scene graphs and summarized the most commonly used datasets. Finally, we provide some insights into the future development of scene graphs. We believe this will be a very helpful foundation for future research on scene graphs.

PDF Abstract

A Comprehensive Survey of Scene Graphs: Generation and Application

The paper "A Comprehensive Survey of Scene Graphs: Generation and Application" provides an extensive examination of the methods and applications associated with scene graph generation (SGG) and its enhancement through prior knowledge. This survey serves as a significant reference for researchers involved in computer vision, focusing on the structured representation of scene graphs, which encapsulate objects, attributes, and their interrelations in a scene, facilitating higher-level semantic understanding and reasoning.

Scene Graph Definition and Challenges

A scene graph represents a visual scene as a structured data model, enabling tasks that require intricate scene understanding, such as image captioning, visual relationship detection, and visual question answering (VQA). Despite the evident utility of scene graphs, challenges persist in SGG, primarily due to the long-tailed distribution of visual relationships, the need for prior knowledge to augment models, and the complexity of reasoning with sparse relationships. The survey underscores these challenges, aiming to bridge current limitations through a comprehensive overview of SGG approaches and their applications.

Scene Graph Generation Techniques

The mechanisms for generating scene graphs are categorized into several methodological frameworks:

CRF-Based Methods: Conditional random fields (CRFs) capture statistical correlations between object pairs and predicates, providing an early stage approach for modeling visual relationships. CRF-based SGG methods such as DR-Net and SG-CRF integrate statistical modeling with neural networks to enhance relationship detection.
TransE-Based Methods: Inspired by knowledge graphs, these methods employ translation embeddings, such as VTransE, to represent relationships as vector transformations in semantic space. This unified representation aids in inferring unseen relationships, crucial for addressing the long-tailed problem.
CNN-Based Methods: Leveraging convolutional neural networks' prowess in feature extraction, methods like LinkNet and ViP-CNN focus on detecting relationships through interaction-based feature extraction strategies. Approaches like Zoom-Net further refine features by considering local and global context interactions.
RNN/LSTM-Based Methods: With inherent strengths in sequence and context modeling, RNNs and LSTMs, as used in IMP and MotifNet, elucidate the temporal and sequential dependencies between scene entities, crucial for scene graph interpretation.
GNN-Based Methods: Graph neural networks (GNNs) form a cornerstone of contemporary SGG, utilizing graph structures to encode object and relationship nodes effectively. Approaches such as Factorizable Net and Graph R-CNN exemplify significant strides in utilizing GNNs for efficient and contextually rich scene graph extraction.

Enhancement with Prior Knowledge

The survey explores augmenting SGG through linguistic, statistical, and knowledge graph-based priors. Language priors exploit semantic word embeddings to mitigate sparse data constraints, whereas statistical priors use historical co-occurrence data to bias model predictions toward more likely relationships. Knowledge graphs serve as a robust framework to imbue SGG models with real-world semantics, thereby empowering models to broader contextual understanding.

Applications and Future Directions

Scene graphs are instrumental in diverse applications, enriching image generation, cross-modal retrieval, and complex visual tasks like human-object interaction recognition and 3D scene understanding. The paper underscores the potential for scene graphs to significantly impact other domains, such as autonomous systems and augmented reality.

The survey concludes by highlighting potential research directions, including tackling the long-tailed distribution of relationships, exploring dynamic scene graphs, and leveraging advanced reasoning and learning paradigms. This survey remains an essential reference for advancing the development and application of scene graphs in realizing detailed and interpretable scene representations.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Xiaojun Chang (148 papers)
Pengzhen Ren (15 papers)
Pengfei Xu (57 papers)
Zhihui Li (51 papers)
Xiaojiang Chen (8 papers)
Alex Hauptmann (7 papers)

Citations (190)

View on Semantic Scholar

A Comprehensive Survey of Scene Graphs: Generation and Application (2104.01111v5)