Social Perception of Faces in a Vision-LLM
The paper "Social perception of faces in a vision-LLM" by Hausladen, Knott, Camerer, and Perona explores the capability of the CLIP (Contrastive Language-Image Pretraining) model to make social judgments of human faces. Their approach involves comparing the similarity in CLIP embeddings between different textual prompts and synthetic face images that are systematically varied along specific dimensions. This design mitigates confounding variables often found in real-world data, offering a clearer examination of biases related to protected attributes such as age, gender, and race.
Key Findings
- Human-Like Social Judgments by CLIP: Despite the broad diversity of images and texts in CLIP’s training set, CLIP can make nuanced human-like social judgments on face images. This finding is significant because it extends CLIP's capabilities from broad to fine-grained social perception.
- Impact of Protected Attributes: The paper reveals that age, gender, and race systematically affect CLIP’s social perception of faces, suggesting inherent biases. Specifically, the authors find pronounced disparities in the perception of Black women’s faces across different ages and facial expressions, indicating extreme values of social perception in these categories.
- Role of Non-Protected Attributes: Non-protected attributes such as facial expression and lighting significantly influence social judgments. For instance, facial expression variabilities, such as smiling, have a larger impact on social perception than age, while lighting affects perception almost as much as age. This underscores the necessity to control non-protected visual attributes to avoid confounding results in bias studies.
Methodology
The authors employ a novel method grounded in social psychology and leverage synthetic face images where attributes are independently and systematically manipulated. This experimental paradigm not only controls for confounding factors present in wild-collected data but also enables causal inference regarding the effect of specific attributes on social perception.
Experimental Setup
Textual prompts are constructed from validated terms in social psychology, representing dimensions such as Warmth and Competence (stereotype content model) and Communion, Agency, and Belief (ABC model).
The synthetic dataset, dubbed CausalFace, systematically varies faces across six dimensions: race, gender, age, facial expression, lighting, and pose. These variations allow for precise control and clear isolation of the effects of each attribute.
Results and Implications
Statistical Similarities and Variations
Bias metrics indicate that CausalFace closely mirrors real-world datasets (FairFace and UTKFace) in bias measurement, thus validating its applicability for this paper.
Comparing variations caused by protected versus non-protected attributes, the paper finds that non-protected attributes like facial expression and lighting can influence social perception as much as or even more than protected attributes. This finding is crucial, indicating the necessity of comprehensive attribute accounting to understand and mitigate biases.
Detailed Observations
- Intersectional Analysis: The paper provides an intersectional analysis, revealing that CLIP’s perception is markedly different across demographic groups, notably impacting Black women’s faces. Age and smiling lead to significant shifts in social perception, with non-protected attributes showing substantial effects.
- Bias Patterns: CLIP demonstrates biased tendencies across various demographic groups, with nuanced responses to age and facial expressions. This insight suggests that biases in vision-LLMs are multifaceted and not merely limited to a few observable socio-demographic categories.
Conclusions
The paper's findings have profound implications for both the practical deployment and theoretical understanding of vision-LLMs. Practically, the research underscores the importance of controlling for both protected and non-protected attributes to accurately assess and mitigate biases in AI systems. Theoretically, it offers a robust experimental framework for studying social biases in any vision-LLM, extending beyond observational methods that are susceptible to confounding variables.
Future Directions
Future research can build on this work by generating even richer synthetic datasets that explore finer intersections of attributes and addressing potential brightness variations for a more nuanced and comprehensive analysis. Additionally, comparing different vision-LLMs could illuminate how various training datasets and architectures influence social judgments, thus driving forward the responsible use of AI in socially sensitive applications.