Single Unit Status in Deep Convolutional Neural Network Codes for Face Identification: Sparseness Redefined (2002.06274v2)

Published 14 Feb 2020 in cs.CV and cs.LG

Abstract: Deep convolutional neural networks (DCNNs) trained for face identification develop representations that generalize over variable images, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. Identity, gender, and viewpoint codes were studied at the "neural unit" and ensemble levels of a face-identification network. At the unit level, identification, gender classification, and viewpoint estimation were measured by deleting units to create variably-sized, randomly-sampled subspaces at the top network layer. Identification of 3,531 identities remained high (area under the ROC approximately 1.0) as dimensionality decreased from 512 units to 16 (0.95), 4 (0.80), and 2 (0.72) units. Individual identities separated statistically on every top-layer unit. Cross-unit responses were minimally correlated, indicating that units code non-redundant identity cues. This "distributed" code requires only a sparse, random sample of units to identify faces accurately. Gender classification declined gradually and viewpoint estimation fell steeply as dimensionality decreased. Individual units were weakly predictive of gender and viewpoint, but ensembles proved effective predictors. Therefore, distributed and sparse codes co-exist in the network units to represent different face attributes. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint information separated into high-dimensional subspaces, ordered by explained variance. Identity, gender, and viewpoint information contributed to all individual unit responses, undercutting a neural tuning analogy for face attributes. Interpretation of neural-like codes from DCNNs, and by analogy, high-level visual codes, cannot be inferred from single unit responses. Instead, "meaning" is encoded by directions in the high-dimensional space.

Authors (6)

Connor J. Parde (5 papers)
Matthew Q. Hill (4 papers)
Carlos D. Castillo (29 papers)
Prithviraj Dhar (9 papers)
Alice J. O'Toole (13 papers)
Y. Ivette Colón (2 papers)

Citations (8)

View on Semantic Scholar

Summary

The paper examines how deep convolutional neural networks encode face attributes, challenging traditional sparseness concepts by focusing on high-dimensional ensemble codes rather than single units.
High face identification accuracy is maintained even with drastically reduced dimensionality, indicating efficient distributed coding where sparse subsets of units contribute non-redundant identity information.
While identity is robustly encoded, attributes like gender and viewpoint require larger ensembles for accurate prediction, suggesting these are encoded more broadly across the network's output space.

Deep Convolutional Neural Networks for Face Identification: A Study of Sparse and Distributed Codes

The paper "Single Unit Status in Deep Convolutional Neural Network Codes for Face Identification: Sparseness Redefined" investigates the representation of face attributes at the unit and ensemble levels within Deep Convolutional Neural Networks (DCNNs) specifically trained for face identification tasks. The paper examines how these networks balance sparseness and distribution in encoding key facial attributes like identity, gender, and viewpoint.

Key Findings

This research explores the representational nature of individual units in a DCNN's top layer and contrasts it with ensemble-level codes derived from the full network. The network in question is a ResNet-101 model, optimized for face recognition across various conditions such as viewpoint and illumination variations. The paper analyzes unit responses from processing over 22,000 images of 3,531 identities, exploring the following aspects:

Identity Encoding:
- The network maintains high identification accuracy even when the dimensionality of its representation is drastically reduced. The paper reports nearly perfect face identification (AUC ≈ 1.0) retains effectiveness with dimensionality as low as 16 units and remains significantly above chance with just 2 units.
- Each unit captures non-redundant identity information reflected in low correlations between unit activations, suggesting a robust distributed coding system that efficiently utilizes sparse subsets of units for identity recognition.
Gender and Viewpoint Encoding:
- While recognition of identity remains robust with a reduced number of units, gender classification and viewpoint estimation demonstrate declining performance as dimensionality decreases. These attributes are more broadly distributed, requiring a larger pool of units to achieve accurate predictions.
- Individual units are generally weak predictors for gender and viewpoint, but when aggregated, they form effective predictive ensembles.
Ensemble-Level Separation:
- Principal Component Analysis (PCA) applied to the face descriptors reveals a separation of identity, gender, and viewpoint information into distinct subspaces, ordered by explained variance. This suggests that each PC aligns more closely with specific facial attributes, indicating systematic network preferences in encoding.
Interpretation of Neural Codes:
- The paper posits that the meaningful "code" for face attributes exists in the high-dimensional space marked by the network's outputs, rather than in isolated unit responses. This finding challenges the traditional analogy of neural tuning at higher levels of visual processing.

Implications and Future Directions

The paper underscores the complexity inherent in DCNN representations of faces and questions the applicability of sparse coding theories traditionally associated with neural processing. It suggests that understanding the operational space of a network, particularly in tasks involving high-level vision, requires a focus on representational spaces rather than single-unit analyses.

These insights have practical implications for both neural network interpretability and neuro-cognitive research. From a practical standpoint, understanding the distributed nature of face encoding in DCNNs could guide the development of more robust face-recognition systems, potentially leading to innovations in security, social media tagging, and personalized AI.

Theoretically, the juxtaposition of unit and ensemble coding in artificial networks might mirror the neural coding strategies employed by biological systems, offering a computational framework to examine face processing in primate brains. As such, the interplay between technological and biological models presents fertile ground for future exploration into the shared principles of vision science.

In conclusion, this paper provides a nuanced understanding of how DCNNs encode facial identity and other attributes, emphasizing the role of high-dimensional representations and challenging simplistic coding analogies in modern neural network research.

PDF Markdown

Related Papers

YouTube

Show All Videos