CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? (2403.04547v1)
Abstract: We study the effectiveness of data-balancing for mitigating biases in contrastive language-image pretraining (CLIP), identifying areas of strength and limitation. First, we reaffirm prior conclusions that CLIP models can inadvertently absorb societal stereotypes. To counter this, we present a novel algorithm, called Multi-Modal Moment Matching (M4), designed to reduce both representation and association biases (i.e. in first- and second-order statistics) in multimodal data. We use M4 to conduct an in-depth analysis taking into account various factors, such as the model, representation, and data size. Our study also explores the dynamic nature of how CLIP learns and unlearns biases. In particular, we find that fine-tuning is effective in countering representation biases, though its impact diminishes for association biases. Also, data balancing has a mixed impact on quality: it tends to improve classification but can hurt retrieval. Interestingly, data and architectural improvements seem to mitigate the negative impact of data balancing on performance; e.g. applying M4 to SigLIP-B/16 with data quality filters improves COCO image-to-text retrieval @5 from 86% (without data balancing) to 87% and ImageNet 0-shot classification from 77% to 77.5%! Finally, we conclude with recommendations for improving the efficacy of data balancing in multimodal systems.
- A reductions approach to fair classification. In ICML, 2018.
- Musiclm: Generating music from text, 2023.
- Challenges in measuring bias via open-ended language generation, 2022.
- A near optimal algorithm for debiasing trained machine learning models. In NeurIPS, 2021.
- A reduction to binary approach for debiasing multiclass datasets. In NeurIPS, 2022a.
- Revisiting neural scaling laws in language and vision. NeurIPS, 2022b.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Beyond adult and compas: Fairness in multi-class prediction. In NeurIPS, 2022.
- A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning. arXiv preprint arXiv:2203.11933, 2022.
- Big vision. https://github.com/google-research/big_vision, 2022.
- Webinsight: making web images accessible. In Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 181–188, 2006.
- Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963, 2021.
- UCI repository of machine learning databases, 1998.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In NeurIPS, 2016.
- Bias and fairness in multimodal machine learning: A case study of automated video interviews. In Proceedings of the 2021 International Conference on Multimodal Interaction, pp. 268–277, 2021.
- Stochastic subgradient methods. 2008. URL https://see.stanford.edu/materials/lsocoee364b/04-stoch_subgrad_notes.pdf.
- Convex optimization. Cambridge University Press, 2004.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In FAccT, 2018.
- What is the effect of importance weighting in deep learning? In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 872–881. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/byrd19a.html.
- Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 2017.
- Optimized pre-processing for discrimination prevention. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/9a49a25d845a483fae4be7e341368e36-Paper.pdf.
- Fair and diverse DPP-based data summarization. In Jennifer Dy and Andreas Krause (eds.), ICML, volume 80 of Proceedings of Machine Learning Research, pp. 716–725. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/celis18a.html.
- Muse: Text-to-image generation via masked generative transformers, 2023.
- Why is my classifier discriminatory? NeurIPS, 31, 2018.
- Pali: A jointly-scaled multilingual language-image model. arXiv preprint arXiv:2209.06794, 2022.
- Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- Dall-eval: Probing the reasoning skills and social biases of text-to-image generative models, 2022.
- Fair generative modeling via weak supervision. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 1887–1898. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/choi20a.html.
- Describing textures in the wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
- Elements of information theory. Wiley & Sons, 1991.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Uncovering the bias in facial expressions. arXiv preprint arXiv:2011.11311, 2020.
- Does object recognition work for everyone?, 2019.
- Measuring and mitigating unintended bias in text classification. In Conference on AI, Ethics, and Society, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
- Diversity in big data: A review. Big data, 5(2):73–84, 2017.
- fairlearn 0.4.6. 2020. URL https://pypi.org/project/fairlearn/.
- Fairness through awareness. In Innovations in Theoretical Computer Science, 2012.
- Benjamin Eva. Principles of indifference. The Journal of Philosophy, 116(7):390–411, 2019.
- Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery, 36(6):2074–2152, 2022.
- Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. 2004.
- Certifying and removing disparate impact. In SIGKDD, pp. 259–268, 2015.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Vision-language models performing zero-shot tasks exhibit gender-based disparities, 2023.
- Equality of opportunity in supervised learning. In NeurIPS, 2016.
- Towards measuring fairness in ai: the casual conversations dataset. IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(3):324–332, 2021.
- Flax: A neural network library and ecosystem for JAX, 2020. URL http://github.com/google/flax.
- Women also snowboard: Overcoming bias in captioning models. In ECCV, 2018.
- Underspecification in scene description-to-depiction tasks, 2022.
- Simple data balancing achieves competitive worst-group-accuracy, 2022.
- Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, pp. 4904–4916. PMLR, 2021.
- Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication, 2009.
- Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1548–1558, 2021.
- Multi-class texture analysis in colorectal cancer histology. Scientific reports, 6:27988, 2016.
- Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 3819–3828, 2015.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Last layer re-training is sufficient for robustness to spurious correlations, 2022.
- Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807, 2016.
- Ron Kohavi. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In SIGKDD, 1996.
- 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Eduardo Blanco and Wei Lu (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pp. 66–71. Association for Computational Linguistics, 2018. doi: 10.18653/v1/d18-2012. URL https://doi.org/10.18653/v1/d18-2012.
- Microsoft coco: Common objects in context, 2015.
- A statistical framework for fair predictive algorithms. arXiv preprint arXiv:1610.08077, 2016.
- Learning adversarially fair and transferable representations, 2018.
- Causally motivated shortcut removal using auxiliary labels. In International Conference on Artificial Intelligence and Statistics, pp. 739–766. PMLR, 2022.
- A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635, 2019.
- Mitigating bias in set selection with noisy protected attributes. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 237–248, 2021.
- Bias amplification and bias unmasking. Political Analysis, 24(3):307–323, 2016.
- Simple open-vocabulary object detection with vision transformers. arXiv preprint arXiv:2205.06230, 2022a.
- Simple open-vocabulary object detection. In European Conference on Computer Vision, pp. 728–755. Springer, 2022b.
- Dall·e 2 preview - risks and limitations, 2022.
- Model cards for model reporting. pp. 220–229, 2019. doi: 10.1145/3287560.3287596. URL https://doi.org/10.1145/3287560.3287596.
- Ellis Monk. The monk skin tone scale, 2019. URL https://skintone.google.
- Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
- Cats and dogs. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Zero-shot text-to-image generation, 2021.
- Hierarchical text-conditional image generation with clip latents, 2022.
- High-resolution image synthesis with latent diffusion models, 2022.
- An investigation of why overparameterization exacerbates spurious correlations. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 8346–8356. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/sagawa20a.html.
- Photorealistic text-to-image diffusion models with deep language understanding, 2022.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs, 2021.
- A step toward more inclusive people annotations for fairness. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021.
- Simplified transfer learning for chest radiography models using less data. Radiology, 305(2):454–465, 2022.
- Online set selection with fairness and diversity constraints. In Proceedings of the EDBT Conference, 2018.
- Improving the fairness of deep generative models without retraining. arXiv preprint arXiv:2012.04842, 2020.
- Mitigating gender bias in captioning systems. In WWW, pp. 633–645, 2021.
- Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering, pp. 1–8, 2022.
- Attention is all you need. In NeurIPS, 2017.
- Counterfactual invariance to spurious correlations: Why and how to pass stress tests. May 2021.
- Are gender-neutral queries really gender-neutral? mitigating gender bias in image search, 2021.
- Fairclip: Social bias elimination based on attribute prototype learning and representation neutralization. arXiv preprint arXiv:2210.14562, 2022.
- Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In ICCV, 2019.
- Concept algebra for text-controlled vision models, 2023.
- Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
- Frank Wilcoxon. Individual comparisons by ranking methods. In Breakthroughs in Statistics: Methodology and Distribution, pp. 196–202. Springer, 1992.
- Mitigating biases in multimodal personality assessment. In Proceedings of the 2020 International Conference on Multimodal Interaction, pp. 361–369, 2020.
- Fairness with overlapping groups. arXiv preprint arXiv:2006.13485, 2020.
- Yi Yang and Shawn Newsam. Bag-of-visual-words and spatial extensions for land-use classification. In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), 2010.
- The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2009.
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014.
- Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022a.
- Scaling autoregressive models for content-rich text-to-image generation, 2022b.
- Florence: A new foundation model for computer vision. arXiv preprint arXiv:2111.11432, 2021.
- Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In International Conference on World Wide Web, 2017.
- Learning fair representations. In ICML, 2013.
- Scaling vision transformers. In CVPR, 2022a.
- Lit: Zero-shot transfer with locked-image text tuning. In CVPR, pp. 18123–18133, 2022b.
- Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11975–11986, October 2023.
- Large-scale domain-specific pretraining for biomedical vision-language processing, 2023.
- Age progression/regression by conditional adversarial autoencoder. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
- Understanding and evaluating racial biases in image captioning. In ICCV, 2021.
- Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457, 2017.
- Ibrahim Alabdulmohsin (31 papers)
- Xiao Wang (507 papers)
- Andreas Steiner (17 papers)
- Priya Goyal (15 papers)
- Alexander D'Amour (37 papers)
- Xiaohua Zhai (51 papers)