Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach (1706.00906v3)

Published 3 Jun 2017 in cs.CV

Abstract: Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal and holistic vs. local) during feature representation learning. In this paper, we present a Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple heterogeneous attributes from a single face image. In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes. We also introduce an unconstrained face database (LFW+), an extension of public-domain LFW, with heterogeneous demographic attributes (age, gender, and race) obtained via crowdsourcing. Experimental results on benchmarks with multiple face attributes (MORPH II, LFW+, CelebA, LFWA, and FotW) show that the proposed approach has superior performance compared to state of the art. Finally, evaluations on a public-domain face database (LAP) with a single attribute show that the proposed approach has excellent generalization ability.

Citations (274)

View on Semantic Scholar

Summary

The paper introduces a deep multi-task CNN that jointly estimates heterogeneous face attributes by integrating shared and category-specific learning to outperform traditional independent methods.
The paper demonstrates superior performance with a 3.0-year MAE for age and over 90% accuracy for gender and race on benchmarks like MORPH II and CelebA.
The paper highlights robust cross-database generalization and potential applications in surveillance, biometric identification, and human-computer interaction.

Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach

This paper presents a novel approach towards estimating multiple face attributes using deep multi-task learning (DMTL). Leveraging the advancements in convolutional neural networks (CNNs), the authors focus on addressing the challenges associated with heterogeneous and correlated face attributes. Traditional methods often consider each attribute independently, failing to utilize the correlations and heterogeneities inherent in face attributes. To overcome these limitations, the proposed DMTL model exploits shared feature learning for all attributes while accommodating specific learning for heterogeneous categories. The following summary explores the key components, results, and implications of the presented approach.

Key Components and Methodology

At the core of the proposed framework is the DMTL network structure, designed to jointly estimate multiple categories of face attributes—nominal, ordinal, holistic, and local. The architecture comprises:

Shared Feature Learning: Leveraging a modified AlexNet, the model incorporates batch normalization layers for early-stage shared feature learning. This feature-sharing mechanism exploits the correlations among attributes to strengthen the learning process.
Category-Specific Feature Learning: The network integrates several shallow subnetworks for each heterogeneous attribute category, handling differences in data type and semantic meaning. By constructing these subnetworks, the approach fine-tunes shared features for each attribute's optimal estimation.
Training and Implementation: The model utilizes stochastic gradient descent for end-to-end optimization. Network training begins with pre-training on a large dataset (CASIA-WebFace), followed by fine-tuning on specific attribute databases such as MORPH II, CelebA, and LFWA.

Results and Evaluation

The proposed approach was thoroughly evaluated on several databases with heterogeneous attributes, including MORPH II for nominal and ordinal attributes and CelebA for binary attributes. Key evaluation results include:

Heterogeneous Attribute Estimation: Achieving 3.0 years MAE for age estimation and over 90% accuracy for gender and race on MORPH II demonstrates the capability to handle different data types effectively.
Binary Attribute Estimation: On CelebA, the framework attains an average accuracy of 93% over 40 attributes, surpassing several state-of-the-art methods.
Cross-Database Generalization: The DMTL model provides satisfactory cross-database generalization, indicating robust performance across varying demographic and environmental conditions.

Implications and Future Directions

The research highlights a significant step forward in face attribute estimation, emphasizing the importance of multi-task learning in harnessing attribute correlations. Practical implications span several applications, including surveillance, social media, and biometric identification, where accurate multi-attribute estimation remains critical.

Future developments could explore improved architectures for handling more complex dependencies between attributes and further expand the network's capabilities to a broader range of face attributes. There is also potential to refine the learning process with more diverse and biased datasets to tackle real-world challenges more comprehensively.

In conclusion, the proposed DMTL approach presents a substantial enhancement in the field of face attribute estimation, providing a versatile and efficient tool for various applications in video surveillance, human-computer interaction, and beyond.

PDF Markdown

Related Papers

YouTube

Show All Videos