Leveraging vision-language models for fair facial attribute classification

Published 15 Mar 2024 in cs.CV | (2403.10624v2)

Abstract: Performance disparities of image recognition across different demographic populations are known to exist in deep learning-based models, but previous work has largely addressed such fairness problems assuming knowledge of sensitive attribute labels. To overcome this reliance, previous strategies have involved separate learning structures to expose and adjust for disparities. In this work, we explore a new paradigm that does not require sensitive attribute labels, and evades the need for extra training by leveraging general-purpose vision-LLM (VLM), as a rich knowledge source for common sensitive attributes. We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution. We find that VLMs can recognize samples with clear attribute information encoded in image representations, thus capture under-performed samples conflicting with attribute-related bias. We train downstream target classifiers by re-sampling and augmenting under-performed attribute groups. Extensive experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines that tackle with arbitrary bias. The work indicates that vision-LLMs can extract discriminative sensitive information prompted by language, and be used to promote model fairness.