Deep Metric Learning for Computer Vision: A Brief Overview
Abstract: Objective functions that optimize deep neural networks play a vital role in creating an enhanced feature representation of the input data. Although cross-entropy-based loss formulations have been extensively used in a variety of supervised deep-learning applications, these methods tend to be less adequate when there is large intra-class variance and low inter-class variance in input data distribution. Deep Metric Learning seeks to develop methods that aim to measure the similarity between data samples by learning a representation function that maps these data samples into a representative embedding space. It leverages carefully designed sampling strategies and loss functions that aid in optimizing the generation of a discriminative embedding space even for distributions having low inter-class and high intra-class variances. In this chapter, we will provide an overview of recent progress in this area and discuss state-of-the-art Deep Metric Learning approaches.
- “Neighbourhood Components Analysis” In Advances in Neural Information Processing Systems 17 MIT Press, 2004 URL: https://proceedings.neurips.cc/paper/2004/file/42fe880812925e520249e808937738d2-Paper.pdf
- Florian Schroff, Dmitry Kalenichenko and James Philbin “Facenet: A unified embedding for face recognition and clustering” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815–823
- Kihyuk Sohn “Improved deep metric learning with multi-class n-pair loss objective” In Advances in neural information processing systems 29, 2016
- “No fuss distance metric learning using proxies” In Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 360–368
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In CoRR abs/1810.04805, 2018 arXiv: http://arxiv.org/abs/1810.04805
- “RoBERTa: A Robustly Optimized BERT Pretraining Approach” In CoRR abs/1907.11692, 2019 arXiv: http://arxiv.org/abs/1907.11692
- “Representation Learning Through Cross-Modality Supervision” In 2019 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019), 2019, pp. 1–8 DOI: 10.1109/FG.2019.8756519
- “Multi-similarity loss with general pair weighting for deep metric learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5022–5030
- “Proxy anchor loss for deep metric learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3238–3247
- “Moving in the Right Direction: A Regularization for Deep Metric Learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
- Eu Wern Teh, Terrance DeVries and Graham W Taylor “Proxynca++: Revisiting and revitalizing proxy neighborhood component analysis” In European Conference on Computer Vision, 2020, pp. 448–464 Springer
- “Universal weighting metric learning for cross-modal matching” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13005–13014
- “Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 17792–17803 URL: https://proceedings.neurips.cc/paper/2020/file/ce016f59ecc2366a43e1c96a4774d167-Paper.pdf
- “Multi Loss Fusion For Matching Smartphone Captured Contactless Finger Images” In 2021 IEEE International Workshop on Information Forensics and Security (WIFS), 2021, pp. 1–6 DOI: 10.1109/WIFS53200.2021.9648393
- “Learning Transferable Visual Models From Natural Language Supervision” In CoRR abs/2103.00020, 2021 arXiv: https://arxiv.org/abs/2103.00020
- “RidgeBase: A Cross-Sensor Multi-Finger Contactless Fingerprint Dataset” In 2022 International Joint Conference on Biometrics (IJCB), 2022
- “Attribute De-biased Vision Transformer (AD-ViT) for Long-Term Person Re-identification” In 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2022, pp. 1–8 DOI: 10.1109/AVSS56176.2022.9959509
- Karsten Roth, Oriol Vinyals and Zeynep Akata “Integrating Language Guidance into Vision-based Deep Metric Learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16177–16189
- “NAPReg: Nouns As Proxies Regularization for Semantically Aware Cross-Modal Embeddings” In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.