Linkage Based Face Clustering via Graph Convolution Network (1903.11306v3)

Published 27 Mar 2019 in cs.CV

Abstract: In this paper, we present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs. Experiments show that our method is more robust to the complex distribution of faces than conventional methods, yielding favorably comparable results to state-of-the-art methods on standard face clustering benchmarks, and is scalable to large datasets. Furthermore, we show that the proposed method does not need the number of clusters as prior, is aware of noises and outliers, and can be extended to a multi-view version for more accurate clustering accuracy.

Citations (171)

View on Semantic Scholar

Summary

The paper reformulates face clustering as a linkage prediction problem using Graph Convolution Networks (GCNs) and Instance Pivot Subgraphs (IPS) to effectively leverage local context.
The method achieves higher accuracy compared to state-of-the-art and demonstrates practical scalability with O(n log n) complexity.
This approach robustly handles noise and outliers without requiring a predefined number of clusters, offering practical applications in data labeling and organization.

An Expert Overview of "Linkage Based Face Clustering via Graph Convolution Network"

The paper "Linkage Based Face Clustering via Graph Convolution Network" by Zhongdao Wang et al. presents a novel approach to face clustering that effectively leverages Graph Convolution Networks (GCNs) for the task of grouping faces based on potential identities. The authors reformulate the face clustering problem as a link prediction challenge, emphasizing the importance of local context within the feature space surrounding each instance. Their method aims to infer the likelihood of linkage between pairs in this feature space context, showcasing robustness against the complex distribution intrinsic to facial data.

Methodology and Robustness

Unlike conventional clustering methods such as K-Means or Spectral Clustering, which make rigid assumptions about data distribution, this approach operates without these constraints, thereby achieving higher accuracy in handling the complex, often non-convex distribution of facial features. The methodology involves constructing instance pivot subgraphs (IPS) for each instance, which encompass its local context up to a certain order of neighbors — high-dimensional local structures that are leveraged to compute linkage likelihoods using a parametric model implemented with a GCN.

Key to this approach is the ability of the GCN to reason about the context depicted in the IPS, predicting whether an instance should be linked with its neighbors based on learned features. Specifically, the IPS normalization strategy — where node features are adjusted relative to a pivot face feature — allows the framework to encode intricate distance relationships and feature variances that better approximate identity linkage probabilities.

Strong Numerical Results

Empirical evaluations in the paper demonstrate that the proposed method compares favorably against state-of-the-art clustering algorithms, achieving higher normalized mutual information (NMI) and F-measure scores across various standard face clustering benchmarks. The scalability of this approach is particularly impressive, maintaining computational complexity within $O(n \log n)$ due to efficient approximate nearest neighbor search algorithms. The authors show that even with additional distractors in large datasets, the runtime remains manageable, offering a practical solution for real-world applications requiring high precision and recall.

Theoretical and Practical Implications

The ability of the proposed framework to accurately handle noise and outliers — without requiring predefined knowledge about the number of clusters — suggests significant implications for practical applications such as automated data labeling and facial organization in large collections. Moreover, the paper demonstrates the extensibility of this method to multimodal clustering tasks (e.g., video face clustering using both visual and audio data), highlighting its adaptability to different types of feature modalities.

Speculation on Future Developments

The use of GCNs in face clustering represents a promising direction for future improvements in the automated grouping of biometric data. With face analysis increasingly integral to applications like fraud detection, personalized user experiences, and security systems, refining methods that can balance accuracy with computational efficiency remains crucial. Future work could explore enhancements in IPS construction, especially in integrating additional contextual modalities, as well as the application of advanced GCN architectures tailored to the unique challenges of biometric clustering.

Ultimately, this paper contributes a robust, scalable framework that refines linkage-based clustering with the adaptive, context-sensitive capabilities of graph neural networks — setting a precedent for subsequent developments in computational intelligence applied to complex biometric data tasks.