Deep Label Distribution Learning with Label Ambiguity (1611.01731v2)

Published 6 Nov 2016 in cs.CV

Abstract: Convolutional Neural Networks (ConvNets) have achieved excellent recognition performance in various visual recognition tasks. A large labeled training set is one of the most important factors for its success. However, it is difficult to collect sufficient training images with precise labels in some domains such as apparent age estimation, head pose estimation, multi-label classification and semantic segmentation. Fortunately, there is ambiguous information among labels, which makes these tasks different from traditional classification. Based on this observation, we convert the label of each image into a discrete label distribution, and learn the label distribution by minimizing a Kullback-Leibler divergence between the predicted and ground-truth label distributions using deep ConvNets. The proposed DLDL (Deep Label Distribution Learning) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from over-fitting even when the training set is small. Experimental results show that the proposed approach produces significantly better results than state-of-the-art methods for age estimation and head pose estimation. At the same time, it also improves recognition performance for multi-label classification and semantic segmentation tasks.

Citations (400)

View on Semantic Scholar

Summary

The paper introduces DLDL, which minimizes the KL divergence between predicted and true label distributions to effectively incorporate ambiguous labels and reduce over-fitting.
It demonstrates superior performance in age and head pose estimation by lowering MAE to 2.51 and achieving higher accuracy on benchmark datasets.
The framework adapts to multi-label classification and semantic segmentation, setting state-of-the-art results and suggesting promising extensions for automating label distribution construction.

Deep Label Distribution Learning With Label Ambiguity

This paper addresses a critical challenge in computer vision tasks where the availability of a large, precisely labeled training dataset is scarce. Traditional Convolutional Neural Networks (ConvNets) often require expansive datasets with unambiguous labels to achieve optimal performance. However, certain domains face inherent difficulties in acquiring such datasets, such as apparent age estimation, head pose estimation, multi-label classification, and semantic segmentation. The proposed solution, Deep Label Distribution Learning (DLDL), capitalizes on label ambiguity by converting each image's label into a discrete label distribution, thereby leveraging the ambiguous information present among labels.

Technical Approach

The authors introduce the DLDL model, which learns label distributions using deep ConvNets. By minimizing the Kullback-Leibler (KL) divergence between the predicted and true label distributions, DLDL effectively incorporates label ambiguity into both feature and classifier learning. This approach aims to prevent over-fitting, even when the training set is small. Label distributions, which naturally describe the ambiguous information among labels, are manually constructed using domain knowledge and assumptions. The approach is versatile, being applicable to both regression and classification tasks.

Numerical Results and Claims

The paper outlines several key numerical results demonstrating DLDL's efficacy across various benchmarks:

Age Estimation: In experiments with the Morph and ChaLearn datasets, DLDL consistently outperforms baselines like BFGS-LDL, traditional ConvNets, and methods using label smoothing. The Mean Absolute Error (MAE) on the Morph dataset is reduced to 2.51, demonstrating DLDL's superior performance.
Head Pose Estimation: On datasets like Pointing'04 and BJUT-3D, DLDL shows lower MAE and higher accuracy than traditional ConvNets and regression models. DLDL can effectively generalize even with fewer data points by capturing the inherent ambiguity in pose angles.
Multi-label Classification: Using the PASCAL VOC dataset, DLDL achieves state-of-the-art performance with a reported mean average precision (mAP) that surpasses previous methods. The paper highlights the advantage of DLDL in incorporating label information derived from ambiguous labels like "Difficult" in multi-label images.
Semantic Segmentation: DLDL improves performance over the Fully Convolutional Networks (FCN) architecture on the PASCAL VOC2011 and VOC2012 datasets, both baseline and CRF-enhanced versions, indicating that label ambiguity incorporation aids in fine-grained segmentation tasks.

Implications and Future Developments

The implications of this work are multifaceted, extending practical significance to applications suffering from data scarcity and inherent label ambiguity. Theoretically, it posits a refined method of integrating label distributions into deep learning models beyond standard classification or regression tasks.

It envisions future work, including exploring more sophisticated methods to construct label distributions and potentially automating this process using data-driven techniques. Additionally, expanding the approach to include more tasks in computer vision, and incorporating external datasets to support tasks with broader label spaces, is a promising avenue.

Overall, DLDL offers a robust framework for tasks where data is not only limited but are also fraught with ambiguity, demonstrating potential to reshape approaches in learning from such datasets.

PDF Markdown