Deep Metric Learning via Lifted Structured Feature Embedding (1511.06452v1)

Published 19 Nov 2015 in cs.CV and cs.LG

Abstract: Learning the distance metric between pairs of examples is of great importance for learning and visual recognition. With the remarkable success from the state of the art convolutional neural networks, recent works have shown promising results on discriminatively training the networks to learn semantic feature embeddings where similar examples are mapped close to each other and dissimilar examples are mapped farther apart. In this paper, we describe an algorithm for taking full advantage of the training batches in the neural network training by lifting the vector of pairwise distances within the batch to the matrix of pairwise distances. This step enables the algorithm to learn the state of the art feature embedding by optimizing a novel structured prediction objective on the lifted problem. Additionally, we collected Online Products dataset: 120k images of 23k classes of online products for metric learning. Our experiments on the CUB-200-2011, CARS196, and Online Products datasets demonstrate significant improvement over existing deep feature embedding methods on all experimented embedding sizes with the GoogLeNet network.

Authors (4)

Hyun Oh Song (32 papers)
Yu Xiang (128 papers)
Stefanie Jegelka (122 papers)
Silvio Savarese (200 papers)

Citations (1,605)

View on Semantic Scholar

Summary

The paper demonstrates a novel method that lifts pairwise distances into a dense matrix to enhance feature embedding learning.
It introduces a structured prediction objective that leverages batch information to optimize higher-order relationships among samples.
The method achieves superior clustering and retrieval performance across benchmarks, including CUB-200-2011, CARS196, and a new Online Products dataset.

Deep Metric Learning via Lifted Structured Feature Embedding

This paper presents a detailed exploration of deep metric learning, focused on optimizing feature embeddings within convolutional neural networks (CNNs). The authors introduce a novel algorithm that efficiently uses training batches to lift pairwise distance vectors into dense pairwise distance matrices. This approach enhances the learning of feature embeddings by optimizing a new structured prediction objective on the lifted problem. The efficacy of this method is demonstrated through robust performance in clustering and retrieval tasks across multiple datasets, including CUB-200-2011, CARS196, and a newly introduced Online Products dataset.

Key Contributions

Lifted Pairwise Distance Matrix: Traditional methods compute loss functions based on individual pairs or triplets of examples, which underutilizes the information available within a batch. The proposed method extends this by lifting the pairwise distances from a vector space ( $O(m)$ ) to a dense matrix space ( $O(m^2)$ ). This lifting allows the algorithm to harness more comprehensive relational information within the batch, leading to richer and more discriminative feature embeddings.
Structured Prediction Objective: The paper defines a novel structured loss function on the lifted problem. Unlike conventional contrastive or triplet loss functions, the structured loss function leverages the dense pairwise distance matrix to optimize the embedding space more effectively. This approach captures higher-order relationships among samples, providing a more robust optimization landscape.
Online Products Dataset: The authors introduce a new dataset comprising approximately 120,000 images across 23,000 classes of online products. This dataset is significant for metric learning due to its large number of categories and diversity in product types. It serves as a benchmark to evaluate the performance of metric learning algorithms in practical settings with extreme classification tasks.

Experimental Results

The proposed method is evaluated using GoogLeNet on three datasets: CUB-200-2011, CARS196, and Online Products. The empirical results show that the lifted structured feature embedding consistently outperforms traditional methods (contrastive and triplet embeddings) across all tested embedding sizes.

Clustering Quality:

The paper reports significant improvements in clustering quality, measured by standard metrics such as $\text{F}_1$ and NMI. The structured loss function leads to more cohesive clusters that are better aligned with ground truth classes.

Retrieval Performance:

The retrieval quality, assessed by Recall@K scores, also shows substantial enhancement. The embeddings learned through the proposed method facilitate more accurate retrievals of relevant images, demonstrating the practical utility in applications requiring high-precision search capabilities.

Figures included in the paper highlight successful retrieval examples, showcasing the effectiveness of the learned embeddings in capturing semantic similarities among diverse visual categories.

Implications and Future Directions

The introduction of the lifted structured feature embedding method has several practical and theoretical implications:

Scalability in Extreme Classification:

By efficiently utilizing batch information, the proposed method addresses the scalability issues often encountered in extreme classification scenarios. This makes it applicable in domains like e-commerce, where the number of categories can be overwhelmingly large.

Optimization Stability:

The smooth upper bound approximation introduced for the structured loss function provides stable gradient signals, enhancing the convergence properties of the learning process. This stability is crucial for training deep networks on large-scale datasets.

Potential Applications:

Beyond clustering and retrieval, the conceptual framework of lifting the batch to a dense pairwise matrix and optimizing structured loss functions can be applied to other learning and recognition tasks. This broad applicability opens avenues for further research in various subfields of machine learning and computer vision.

Conclusion

The paper "Deep Metric Learning via Lifted Structured Feature Embedding" provides a thorough investigation into enhancing metric learning through a novel approach to embedding optimization. The proposed method leverages batch information more effectively, leading to considerable improvements in clustering and retrieval tasks. The newly introduced Online Products dataset further enriches the resource pool for future research in extreme classification. While the current work demonstrates significant advancements, it also lays the groundwork for future explorations into applying structured prediction objectives in broader contexts within deep learning.

PDF Markdown