Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective-Supervised Contrastive Learning with Noisy Labels (2203.04181v1)

Published 8 Mar 2022 in cs.CV, cs.AI, and cs.LG

Abstract: Deep networks have strong capacities of embedding data into latent representations and finishing following tasks. However, the capacities largely come from high-quality annotated labels, which are expensive to collect. Noisy labels are more affordable, but result in corrupted representations, leading to poor generalization performance. To learn robust representations and handle noisy labels, we propose selective-supervised contrastive learning (Sel-CL) in this paper. Specifically, Sel-CL extend supervised contrastive learning (Sup-CL), which is powerful in representation learning, but is degraded when there are noisy labels. Sel-CL tackles the direct cause of the problem of Sup-CL. That is, as Sup-CL works in a \textit{pair-wise} manner, noisy pairs built by noisy labels mislead representation learning. To alleviate the issue, we select confident pairs out of noisy ones for Sup-CL without knowing noise rates. In the selection process, by measuring the agreement between learned representations and given labels, we first identify confident examples that are exploited to build confident pairs. Then, the representation similarity distribution in the built confident pairs is exploited to identify more confident pairs out of noisy pairs. All obtained confident pairs are finally used for Sup-CL to enhance representations. Experiments on multiple noisy datasets demonstrate the robustness of the learned representations by our method, following the state-of-the-art performance. Source codes are available at https://github.com/ShikunLi/Sel-CL

Citations (144)

Summary

  • The paper introduces Sel-CL, a method that selectively uses confident examples to create reliable pairs mitigating the adverse effects of noisy labels.
  • The approach employs dynamic thresholding based on learned representation similarities to iteratively refine pair selection without preset noise rates.
  • Empirical results on datasets like CIFAR-10, CIFAR-100, and WebVision-50 demonstrate enhanced generalization and robust performance under noisy conditions.

Selective-Supervised Contrastive Learning with Noisy Labels

The paper "Selective-Supervised Contrastive Learning with Noisy Labels" presents a novel methodology to improve representation learning in the presence of noisy labels, a common challenge in deep learning applications. This approach, termed Selective-Supervised Contrastive Learning (Sel-CL), extends the principles of Supervised Contrastive Learning (Sup-CL) to address the detrimental effects of label noise without necessitating knowledge of the noise rates.

Sel-CL targets the core issue of noisy labels that degrade the effectiveness of existing Sup-CL methodologies. Sup-CL typically involves pair-wise processing, where incorrect labels lead to erroneous pairings, thus corrupting latent representations and impairing the generalization of the underlying deep network. Sel-CL introduces a mechanism to selectively filter confident pairs for representation learning, circumventing the drawbacks of noisy supervision.

The approach is primarily driven by two key operations: the selection of confident examples and the construction of trustworthy pairs. Initially, Sel-CL identifies examples that exhibit a high degree of alignment between learned representations and their given labels, termed confident examples. These confident examples are then utilized to form reliable pairs. As Sel-CL operates without the necessity of pre-estimated noise rates, it employs a dynamic thresholding strategy based on learned representation similarities to iteratively refine pair selection. This mechanism ensures that both genuinely correct pairs and those containing misclassified yet similarly labeled examples are exploited to improve representation learning.

Empirical validation on several benchmark datasets, including CIFAR-10, CIFAR-100, and WebVision-50, demonstrates the robustness of Sel-CL in learning effective representations from noisy datasets. In synthetic noise scenarios, Sel-CL exhibits competitive performance, particularly under asymmetric noise conditions. Furthermore, the paper outlines the integration of contrastive learning techniques with conventional classification objectives to stabilize the training process, thereby enhancing the network’s ability to generalize from noisy data.

Sel-CL’s ability to selectively utilize only reliable data pairs holds significant implications for practical applications in domains where label noise is prevalent, such as in web-scraped data or crowd-sourced annotations. Moreover, the absence of a need for precise noise rate estimation presents a substantial advantage in scalability and applicability to diverse datasets. As deep learning continues to expand into areas with less controlled data collection environments, methods like Sel-CL that leverage intrinsic data characteristics to improve learning outcomes are likely to gain prominence.

The paper sets the stage for future research directions, anticipating advancements in noise-robust learning and fine-tuning methodologies that further minimize the supervised learning dependency on noise-free data. Additionally, it invites exploration into the integration of Sel-CL with more sophisticated data augmentation and model ensemble strategies to potentially expand its capabilities across a wider array of tasks.

Github Logo Streamline Icon: https://streamlinehq.com