BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM (2407.15240v3)
Abstract: Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability to generate high-quality images, which also raises concerns about the social biases in their outputs, especially in the human generation. Sociological research has established systematic classifications of bias. However, existing bias research about T2I models conflates different types of bias, impeding methodological progress. In this paper, we introduce BIGbench, a unified benchmark for Biases of Image Generation, featuring a meticulously designed dataset. Unlike existing benchmarks, BIGbench classifies and evaluates biases across four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes, which ensures exceptional accuracy for analysis. Furthermore, BIGbench applies advanced multi-modal LLMs to achieve fully automated and highly accurate evaluations. We apply BIGbench to evaluate eight representative general T2I models and three debiased methods. Our human evaluation results underscore BIGbench's effectiveness in aligning images and identifying various biases. Besides, our study also reveal new research directions about biases, such as the effect of distillation and irrelevant protected attributes. Our benchmark is openly accessible at https://github.com/BIGbench2024/BIGbench2024/ to ensure reproducibility.
- Hanjun Luo (8 papers)
- Haoyu Huang (11 papers)
- Ziye Deng (3 papers)
- Xuecheng Liu (6 papers)
- Ruizhe Chen (32 papers)
- Zuozhu Liu (78 papers)