Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample-Specific Debiasing for Better Image-Text Models (2304.13181v2)

Published 25 Apr 2023 in cs.LG and cs.CV

Abstract: Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval. One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points. Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class. In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate. To improve the quality of learned representations, we develop a novel approach that corrects for false negatives. Our method can be viewed as a variant of debiased contrastive learning that uses estimated sample-specific class probabilities. We provide theoretical analysis of the objective function and demonstrate the proposed approach on both image and paired image-text data sets. Our experiments illustrate empirical advantages of sample-specific debiasing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peiqi Wang (18 papers)
  2. Yingcheng Liu (7 papers)
  3. Ching-Yun Ko (19 papers)
  4. William M. Wells (11 papers)
  5. Seth Berkowitz (8 papers)
  6. Steven Horng (17 papers)
  7. Polina Golland (78 papers)