HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation (2403.12033v1)
Abstract: Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks. Code is available at https://github.com/zhangce01/HiKER-SGG.
- Visual relationship detection using scene graphs: A survey. arXiv preprint arXiv:2005.08045, 2020.
- Knowledge-guided short-context action anticipation in human-centric videos. arXiv preprint arXiv:2309.05943, 2023a.
- Sample-efficient learning of novel visual concepts. In CoLLAs, pages 637–657. PMLR, 2023b.
- Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021.
- A comprehensive survey of scene graphs: Generation and application. IEEE TPAMI, 45(1):1–26, 2021.
- Resistance training using prior bias: toward unbiased scene graph generation. In AAAI, pages 212–220, 2022.
- Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In CVPR, pages 2624–2632, 2019a.
- Knowledge-embedded routing network for scene graph generation. In CVPR, pages 6163–6171, 2019b.
- More knowledge, less bias: Unbiasing scene graph generation with explicit ontological adjustment. In WACV, pages 4023–4032, 2023.
- Recovering the unbiased scene graphs from the biased ones. In ACM MM, pages 1581–1590, 2021.
- Learning phrase representations using rnn encoder–decoder for statistical machine translation. In EMNLP, pages 1724–1734, 2014.
- Detecting visual relationships with deep relational networks. In CVPR, pages 3076–3086, 2017.
- Hierarchical memory learning for fine-grained scene graph generation. In ECCV, pages 266–283. Springer, 2022.
- Learning of visual relations: The devil is in the tails. In ICCV, pages 15404–15413, 2021.
- Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation. In CVPR, pages 19427–19436, 2022.
- Attend, infer, repeat: Fast scene understanding with generative models. In NeurIPS, 2016.
- Corrupted image modeling for self-supervised visual pre-training. In ICLR, 2023.
- Scenegenie: Scene graph guided diffusion models for image synthesis. In ICCV, pages 88–98, 2023.
- Not all relations are equal: Mining informative labels for scene graph generation. In CVPR, pages 15596–15606, 2022.
- Glare: A dataset for traffic sign detection in sun glare. IEEE TITS, 2023.
- Scene graph generation with external knowledge and image reconstruction. In CVPR, pages 1969–1978, 2019.
- From general to specific: Informative scene graph generation via balance adjustment. In ICCV, pages 16383–16392, 2021.
- Physics-based rendering for improving robustness to rain. In ICCV, pages 10203–10212, 2019.
- Divide-and-conquer predictor for unbiased scene graph generation. IEEE TCSVT, 32(12):8611–8622, 2022.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Learning from the scene and borrowing from the rich: tackling the long tail in scene graph generation. In IJCAI, pages 587–593, 2021.
- State-aware compositional learning toward unbiased training for scene graph generation. IEEE TIP, 32:43–56, 2022.
- Benchmarking neural network robustness to common corruptions and perturbations. In ICLR, 2018.
- Augmix: A simple data processing method to improve robustness and uncertainty. In ICLR, 2019.
- Pyramid adversarial training improves vit performance. In CVPR, pages 13419–13429, 2022.
- Image captioning based on scene graphs: A survey. Expert Systems with Applications, page 120698, 2023.
- Scene graph generation from hierarchical relationship reasoning. arXiv preprint arXiv:2303.06842, 2023.
- Image retrieval using scene graphs. In CVPR, pages 3668–3678, 2015.
- Image generation from scene graphs. In CVPR, pages 1219–1228, 2018.
- Stephen C Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241–254, 1967.
- Devil’s on the edges: Selective quad attention for scene graph generation. In CVPR, pages 18664–18674, 2023.
- On the effectiveness of adversarial training against common corruptions. In UAI, pages 1012–1021. PMLR, 2022.
- Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 123:32–73, 2017.
- Symbolic replay: Scene graph as prompt for continual learning on vqa task. In AAAI, pages 1250–1259, 2023.
- The devil is in the labels: Noisy label correction for robust scene graph generation. In CVPR, pages 18869–18878, 2022a.
- Label semantic knowledge distillation for unbiased scene graph generation. IEEE TCSVT, 2023.
- Bipartite graph network with adaptive message passing for unbiased scene graph generation. In CVPR, pages 11109–11119, 2021.
- Ppdl: Predicate probability distribution based loss for unbiased scene graph generation. In CVPR, pages 19447–19456, 2022b.
- Know more say less: Image captioning based on scene graphs. IEEE TMM, 21(8):2117–2130, 2019.
- Rethinking the evaluation of unbiased scene graph generation. In BMVC, 2022c.
- Embodied semantic scene graph generation. In CoRL, pages 1585–1594. PMLR, 2022d.
- Gated graph sequence neural networks. In ICLR, 2016.
- Factorizable net: an efficient subgraph-based framework for scene graph generation. In ECCV, pages 335–351. Springer, 2018.
- George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
- On interaction between augmentations and corruptions in natural corruption robustness. In NeurIPS, pages 3571–3583, 2021.
- The norm must go on: Dynamic unsupervised domain adaptation by normalization. In CVPR, pages 14765–14775, 2022.
- Glove: Global vectors for word representation. In EMNLP, pages 1532–1543, 2014.
- Scene graph refinement network for visual question answering. IEEE TMM, 25:3950–3961, 2023.
- Deep learning for seeing through window with raindrops. In ICCV, pages 2463–2471, 2019.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, page 91–99, 2015.
- A simple way to make neural networks robust against diverse image corruptions. In ECCV, pages 53–69. Springer, 2020.
- Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI, page 4444–4451, 2017.
- Energy-based learning for scene graph generation. In CVPR, pages 13936–13945, 2021.
- Unbiased scene graph generation via two-stage causal modeling. IEEE TPAMI, 2023.
- Learning to compose dynamic tree structures for visual contexts. In CVPR, pages 6619–6628, 2019.
- Unbiased scene graph generation from biased training. In CVPR, pages 3716–3725, 2020.
- Cross-inferential networks for source-free unsupervised domain adaptation. In ICIP, pages 96–100. IEEE, 2023a.
- Neuro-modulated hebbian learning for fully test-time adaptation. In CVPR, pages 3728–3738, 2023b.
- Mask and predict: Multi-step reasoning for scene graph generation. In ACM MM, pages 4128–4136, 2021.
- Rain rendering for evaluating and improving robustness to bad weather. IJCV, 129:341–360, 2021.
- Learning 3d semantic scene graphs from 3d indoor reconstructions. In CVPR, pages 3961–3970, 2020.
- Improving scene graph generation with superpixel-based interaction learning. In ACM MM, pages 1809–1820, 2023.
- Cross-modal scene graph matching for relationship-aware image-text retrieval. In WACV, pages 1508–1517, 2020a.
- Exploring context and visual pattern of relationship for scene graph generation. In CVPR, pages 8188–8197, 2019.
- Sketching image gist: Human-mimetic hierarchical scene graph generation. In ECCV, pages 222–239. Springer, 2020b.
- Scene graph to image synthesis via knowledge consensus. In AAAI, pages 2856–2865, 2023.
- Unified perceptual parsing for scene understanding. In ECCV, pages 418–434. Springer, 2018.
- Scene graph generation by iterative message passing. In CVPR, pages 5410–5419, 2017.
- Meta spatio-temporal debiasing for video scene graph generation. In ECCV, pages 374–390. Springer, 2022.
- Pcpl: Predicate-correlation perception learning for unbiased scene graph generation. In ACM MM, pages 265–273, 2020.
- Graph r-cnn for scene graph generation. In ECCV, pages 670–685. Springer, 2018.
- Auto-encoding scene graphs for image captioning. In CVPR, pages 10685–10694, 2019.
- Logicdef: An interpretable defense framework against adversarial examples via inductive scene graph reasoning. In AAAI, pages 8840–8848, 2022.
- Linguistic structures as weak supervision for visual scene graph generation. In CVPR, pages 8289–8299, 2021.
- A fourier perspective on model robustness in computer vision. In NeurIPS, pages 13276–13286, 2019.
- Image-to-image retrieval by learning similarity between scene graphs. In AAAI, pages 10718–10726, 2021.
- Cogtree: Cognition tree loss for unbiased scene graph generation. In IJCAI, pages 1274–1280, 2021.
- Bridging knowledge graphs to generate scene graphs. In ECCV, pages 606–623. Springer, 2020a.
- Learning visual commonsense for robust scene graph generation. In ECCV, pages 642–657. Springer, 2020b.
- Neural motifs: Scene graph parsing with global context. In CVPR, pages 5831–5840, 2018.
- An empirical study on leveraging scene graphs for visual question answering. In BMVC, 2019.
- Robust hierarchical scene graph generation. In NeurIPS 2023 Workshop: New Frontiers in Graph Learning, 2023.
- mixup: Beyond empirical risk minimization. In ICLR, 2018.
- Memo: Test time robustness via adaptation and augmentation. In NeurIPS, pages 38629–38642, 2022.
- Prototype-based embedding network for scene graph generation. In CVPR, pages 22783–22792, 2023.
- Hierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs. In ICRA, pages 6541–6548. IEEE, 2021.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.