- The paper introduces NICE, a model-agnostic approach that refines noisy dataset annotations to boost scene graph generation accuracy.
- It utilizes negative and positive sample detection modules to identify and correct missing or inconsistent labels via confidence scoring and clustering.
- Experimental results on the Visual Genome dataset demonstrate substantial improvements across SGG tasks, mitigating long-tailed biases in models like Motifs and VCTree.
Overview of "The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation"
The paper "The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation" presents a novel approach to improving Scene Graph Generation (SGG) by addressing the inherent issues in existing datasets with noisy annotations. The authors highlight the inadequacy of prevailing assumptions in SGG that all positive labels are correct and all un-annotated samples are background. These assumptions lead to biases that negatively impact the robustness and accuracy of SGG models.
Scene Graph Generation, a pivotal part of understanding visual scenes, involves identifying object instances and their pairwise visual relationships. However, it faces challenges due to the imbalance in dataset annotations, termed the "long-tailed" problem, where certain predicate categories are underrepresented. Traditional techniques to mitigate these biases rely on re-balancing strategies or manipulating pre-trained models, but they often overlook the noise inherent in the dataset labels themselves.
To rectify these issues, the paper proposes NICE (NoIsy label CorrEction strategy), a model-agnostic technique that focuses on refining dataset annotations to enhance SGG. NICE consists of the following components:
- Negative Noisy Sample Detection (Neg-NSD): This module identifies missing annotations by treating the detection problem as an out-of-distribution (OOD) challenge. Utilizing a confidence-based model, it assigns pseudo labels to these samples, effectively increasing the dataset’s size with potentially valid but underrepresented samples.
- Positive Noisy Sample Detection (Pos-NSD): Using a clustering method based on visual similarity, Pos-NSD identifies inconsistencies among positive samples. It segregates samples into subsets based on their local densities and identifies noisy samples that do not align with the general feature distribution.
- Noisy Sample Correction (NSC): Implementing a weighted K-nearest neighbor (wKNN) algorithm, NSC reassigns more consistent labels to identified noisy samples, ensuring that visual patterns align better with their semantic labels.
The authors provide extensive experimental results on the Visual Genome (VG) dataset to validate the effectiveness of NICE. When integrated into state-of-the-art SGG models like Motifs and VCTree, NICE demonstrates substantial improvements, particularly in metrics designed to assess unbiased scene graph generation (mR@K), across multiple tasks such as Predicate Classification, Scene Graph Classification, and Scene Graph Generation.
The implications of this approach are significant. By improving label accuracy, NICE enhances the training dataset, offering more balanced exposure to varied predicate categories. This, in turn, helps models perform better on underrepresented categories, addressing biases inherent in the dataset. Nevertheless, while NICE makes substantial strides in improving dataset quality and model robustness, certain limitations remain, such as the potential inclusion of incorrectly labeled samples and the varying impacts of hyperparameters across different predicate categories.
Looking ahead, the development and refinement of methods like NICE could bring about enhanced training workflows not only in SGG but broader AI applications. Strengthening datasets through label correction will likely become a more prominent aspect of AI model training, addressing reconstruction efficiencies and enabling more generalized learning across various domains.