Recovering the Unbiased Scene Graphs from the Biased Ones (2107.02112v1)

Published 5 Jul 2021 in cs.CV and cs.MM

Abstract: Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects. Recently, more efforts have been paid to the long tail problem in SGG; however, the imbalance in the fraction of missing labels of different classes, or reporting bias, exacerbating the long tail is rarely considered and cannot be solved by the existing debiasing methods. In this paper we show that, due to the missing labels, SGG can be viewed as a "Learning from Positive and Unlabeled data" (PU learning) problem, where the reporting bias can be removed by recovering the unbiased probabilities from the biased ones by utilizing label frequencies, i.e., the per-class fraction of labeled, positive examples in all the positive examples. To obtain accurate label frequency estimates, we propose Dynamic Label Frequency Estimation (DLFE) to take advantage of training-time data augmentation and average over multiple training iterations to introduce more valid examples. Extensive experiments show that DLFE is more effective in estimating label frequencies than a naive variant of the traditional estimate, and DLFE significantly alleviates the long tail and achieves state-of-the-art debiasing performance on the VG dataset. We also show qualitatively that SGG models with DLFE produce prominently more balanced and unbiased scene graphs.

Citations (101)

View on Semantic Scholar

Summary

The paper introduces a PU learning framework to tackle reporting bias in scene graph generation.
It presents Dynamic Label Frequency Estimation that uses data augmentation to rebalance label distributions.
The method integrates with various SGG models and significantly boosts tail-class recall.

Analysis of Unbiased Scene Graphs from Biased Samples

The paper conducted by Chiou et al. addresses the challenge of reporting bias within the domain of scene graph generation (SGG). Scene graph generation is a pivotal task in computer vision that involves deciphering the relationships between objects in images, typically structured as (subject-predicate-object) tuples. A particularly persistent roadblock in advancing SGG's effectiveness is the long tail problem, wherein certain visual relationships are overwhelmingly frequent in datasets, while others are scarcely seen or missing altogether. This imbalance not only skews the datasets but also the resulting models that are trained on them.

Main Contributions

PU Learning Approach: The paper recontextualizes scene graph generation into a Learning from Positive and Unlabeled data (PU learning) problem. In this context, the unobserved class labels in datasets lead to biased predictions. Existing debiasing methods often overlook these missing labels exacerbated by reporting bias—a situation where prevalently co-occurring predicates are more thoroughly labeled. This pivot in perspective forms the crux of their model's novelty.
Dynamic Label Frequency Estimation (DLFE): To tackle reporting bias, they propose a methodology to estimate label frequencies dynamically during training. By aggressively leveraging data augmentations like random flipping, DLFE accumulates a more representative sample pool over multiple training epochs, thereby improving the reliability of label frequency estimates.
Methodological Flexibility: The proposed solution is designed to be model-agnostic. Integration into various SGG backbones like MOTIFS or VCTree is straightforward, showcasing broad applicability throughout the field.

Numerical Results

The authors rigorously validate DLFE against existing debiasing methodologies across all standard SGG evaluation settings (PredCls, SGCls, and SGDet). The paper reports that models augmented with DLFE consistently outperform those employing traditional estimation techniques. Particularly in the SGDet setting, DLFE demonstrates superior gains even within the tail-class recall metrics, which indicates a substantial alleviation of the long tail problem. For instance, having run benchmarks against competitive prior solutions like TDE, PCPL, and Reweighting, DLFE establishes its state-of-the-art status specifically in terms of mean recall, which underscores its efficacy in balanced representation learning.

Implications and Future Directions

From a theoretical viewpoint, the work underscores the importance of adjusting for label frequency distribution, advocating that a more balanced dataset can significantly mitigate the long tail issue in SGG. Practically, the model offers improvements capable of extending to downstream tasks like image captioning and visual question answering. The potential cross-pollination with other fields such as bias reduction in natural language processing or feature rarity in any domain reliant on heavily biased datasets could be robust areas of future exploration.

Continued exploration along this vector might meditate on adapting other sophisticated PU learning strategies or exploring the effects of different augmentation strategies on label frequency estimation's precision. Tailoring DLFE further through leveraging synthetic datasets capable of generating underrepresented relationships could also deepen the manifold applicability of their approach.

In conclusion, Chiou et al.’s contributions make strides towards rectifying an understated yet foundational issue within scene graph generation, positioning their methodology as pivotal in guiding future SGG research trajectory toward a more unbiased depiction of visual understanding.

PDF Markdown

Related Papers

GitHub

GitHub - coldmanck/recovering-unbiased-scene-graphs: Official implementation of "Recovering the Unbiased Scene Graphs from the Biased Ones" (ACMMM 2021) (79 stars)