Debiased Contrastive Learning (2007.00224v3)

Published 1 Jul 2020 in cs.LG and stat.ML

Abstract: A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled datapoints, implicitly accepting that these points may, in reality, actually have the same label. Perhaps unsurprisingly, we observe that sampling negative examples from truly different labels improves performance, in a synthetic setting where labels are available. Motivated by this observation, we develop a debiased contrastive objective that corrects for the sampling of same-label datapoints, even without knowledge of the true labels. Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks. Theoretically, we establish generalization bounds for the downstream classification task.

Authors (5)

Ching-Yao Chuang (16 papers)
Joshua Robinson (35 papers)
Lin Yen-Chen (12 papers)
Antonio Torralba (178 papers)
Stefanie Jegelka (122 papers)

Citations (502)

View on Semantic Scholar

Summary

Debiased Contrastive Learning

The paper presents a novel method for self-supervised representation learning called Debiased Contrastive Learning. Traditional contrastive learning methods involve contrasting semantically similar and dissimilar sample pairs. However, without labels, the assumption is that negative samples are randomly drawn, which can inadvertently include samples with the same label as the positive ones. This sampling bias can negatively impact performance.

Key Contributions

Debiased Contrastive Objective: The authors introduce a debiased contrastive objective that aims to correct for the sampling of negative examples. This approach does not require knowledge of true labels but seeks to approximate the distribution of negative examples by adjusting the contrastive loss function. This adjustment attempts to mitigate the issues arising from sampling bias.
Empirical Evaluation: The proposed debiased contrastive loss is evaluated across several domains, including vision, language, and reinforcement learning, demonstrating consistent improvements over existing state-of-the-art methods. The experiments cover datasets such as CIFAR10, STL10, and ImageNet-100, as well as the BookCorpus dataset for sentence embeddings.
Theoretical Insights: The paper provides a theoretical analysis showing that the debiased contrastive loss can approximate a supervised learning loss. It establishes generalization bounds for downstream classification tasks by relating this unsupervised loss to supervised objectives.

Numerical Results

The results are compelling, with the debiased objective demonstrating superior performance. For instance, on STL10, the approach improved the accuracy of SimCLR by 4.26%. The t-SNE visualization of embeddings learned on CIFAR10 shows better class separation, closely resembling results from an unbiased supervised loss.

Implications and Future Directions

The introduction of a debiased contrastive approach paves the way for enhanced representation learning in scenarios where labeled data is scarce or unavailable. This is particularly valuable in fields like medical data analysis or drug discovery, where obtaining labels is challenging. Future research may explore extending this framework to semi-supervised settings and further analyzing different strategies for obtaining positive pairs. Additionally, understanding the intersection of sampling bias with domain-specific applications could yield further improvements.

Overall, this methodological advance in debiased contrastive learning holds promise for more accurate and robust self-supervised learning models, extending the applicability and effectiveness of AI systems across diverse scientific and practical domains.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - chingyaoc/DCL: NeurIPS 2020, Debiased Contrastive Learning (282 stars)