Understanding Hyperbolic Metric Learning through Hard Negative Sampling (2404.15523v2)
Abstract: In recent years, there has been a growing trend of incorporating hyperbolic geometry methods into computer vision. While these methods have achieved state-of-the-art performance on various metric learning tasks using hyperbolic distance measurements, the underlying theoretical analysis supporting this superior performance remains under-exploited. In this study, we investigate the effects of integrating hyperbolic space into metric learning, particularly when training with contrastive loss. We identify a need for a comprehensive comparison between Euclidean and hyperbolic spaces regarding the temperature effect in the contrastive loss within the existing literature. To address this gap, we conduct an extensive investigation to benchmark the results of Vision Transformers (ViTs) using a hybrid objective function that combines loss from Euclidean and hyperbolic spaces. Additionally, we provide a theoretical analysis of the observed performance improvement. We also reveal that hyperbolic metric learning is highly related to hard negative sampling, providing insights for future work. This work will provide valuable data points and experience in understanding hyperbolic image embeddings. To shed more light on problem-solving and encourage further investigation into our approach, our code is available online (https://github.com/YunYunY/HypMix).
- Objects that sound. In Proceedings of the European conference on computer vision (ECCV), pages 435–451, 2018.
- A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
- Hyperbolic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4453–4462, 2022.
- Learning representations by maximizing mutual information across views. Advances in neural information processing systems, 32, 2019.
- Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100, 1998.
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
- Improving semantic embedding consistency by metric learning for zero-shot classiffication. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 730–746. Springer, 2016.
- Deep metric learning to rank. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1861–1870, 2019.
- Hyperbolic geometry. Flavors of geometry, 31(59-115):2, 1997.
- Similarity metric learning for face recognition. In Proceedings of the IEEE international conference on computer vision, pages 2408–2415, 2013.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, October 2021.
- Hybrid-attention based decoupled metric learning for zero-shot image retrieval. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2750–2759, 2019.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33:22243–22255, 2020.
- Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems, 34, 2021.
- Beyond triplet loss: a deep quadruplet network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 403–412, 2017.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
- Debiased contrastive learning. Advances in neural information processing systems, 33:8765–8775, 2020.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- Training vision transformers for image retrieval. arXiv preprint arXiv:2102.05644, 2021.
- Hyperbolic vision transformers: Combining improvements in metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7409–7419, 2022.
- Kernel methods in hyperbolic spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10665–10674, 2021.
- Hyperbolic neural networks. Advances in neural information processing systems, 31, 2018.
- Curvature generation in curved spaces for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8691–8700, 2021.
- Hyperbolic contrastive learning for visual representations beyond objects. arXiv preprint arXiv:2212.00653, 2022.
- Robust contrastive learning using negative samples with diminished semantics. Advances in Neural Information Processing Systems, 34, 2021.
- Weifeng Ge. Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV), pages 269–285, 2018.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
- Clipped hyperbolic classifiers are super-hyperbolic classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11–20, 2022.
- Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742. IEEE, 2006.
- Part-regularized near-duplicate vehicle re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3997–4005, 2019.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
- Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 6(3):1–26, 2010.
- Adco: Adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1074–1083, 2021.
- Projection metric learning on grassmann manifold with application to video based face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 140–149, 2015.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR, 2015.
- Metric learning with horde: High-order regularizer for deep embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6539–6548, 2019.
- A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2020.
- Adaptive metric learning for zero-shot recognition. IEEE Signal Processing Letters, 26(9):1270–1274, 2019.
- Svd: A large-scale short video dataset for near-duplicate video retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5281–5289, 2019.
- Multi-scale metric learning for few-shot learning. IEEE Transactions on Circuits and Systems for Video Technology, 31(3):1091–1102, 2020.
- Hard negative mixing for contrastive learning. Advances in Neural Information Processing Systems, 33:21798–21809, 2020.
- Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6418–6428, 2020.
- Proxy anchor loss for deep metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Mixco: Mix-up contrastive learning for visual representation. arXiv preprint arXiv:2010.06300, 2020.
- Attention-based ensemble for deep metric learning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 736–751, 2018.
- Near-duplicate video retrieval with deep metric learning. In Proceedings of the IEEE international conference on computer vision workshops, pages 347–356, 2017.
- 3d object representations for fine-grained categorization. In 2013 IEEE International Conference on Computer Vision Workshops, pages 554–561, 2013.
- i-mix: A domain-agnostic strategy for contrastive representation learning. arXiv preprint arXiv:2010.08887, 2020.
- Self-supervised pre-training with hard examples improves visual representations. arXiv preprint arXiv:2012.13493, 2020.
- Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966, 2020.
- Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia, 17(11):1989–1999, 2015.
- Univip: A unified framework for self-supervised visual pre-training. arXiv preprint arXiv:2203.06965, 2022.
- Efficient psd constrained asymmetric metric learning for person re-identification. In Proceedings of the IEEE international conference on computer vision, pages 3685–3693, 2015.
- Contrastive multi-view hyperbolic hierarchical clustering. arXiv preprint arXiv:2205.02618, 2022.
- Enhancing hyperbolic graph embeddings via contrastive learning. arXiv preprint arXiv:2201.08554, 2022.
- Hyperbolic visual embedding learning for zero-shot recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9273–9281, 2020.
- Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 212–220, 2017.
- An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893, 2018.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Object-aware cropping for self-supervised learning. arXiv preprint arXiv:2112.00319, 2021.
- Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6707–6717, 2020.
- Rethinking the compositionality of point clouds through regularization in the hyperbolic space. arXiv preprint arXiv:2209.10318, 2022.
- No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision, pages 360–368, 2017.
- A metric learning reality check. In European Conference on Computer Vision, pages 681–699. Springer, 2020.
- Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
- Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In International Conference on Machine Learning, pages 3779–3788. PMLR, 2018.
- Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4004–4012, 2016.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Deep metric learning with bier: Boosting independent embeddings robustly. IEEE transactions on pattern analysis and machine intelligence, 42(2):276–290, 2018.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Hyperbolic deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10023–10044, 2021.
- Crafting better contrastive views for siamese representation learning. arXiv preprint arXiv:2202.03278, 2022.
- Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. Advances in Neural Information Processing Systems, 33:3407–3418, 2020.
- Softtriple loss: Deep metric learning without triplet sampling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6450–6458, 2019.
- Transductive episodic-wise adaptive metric for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3603–3612, 2019.
- A simple data mixing prior for improving self-supervised learning. In CVPR, 2022.
- Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592, 2020.
- Contrastive learning with hard negative samples. In International Conference on Learning Representations, 2021.
- Mic: Mining interclass characteristics for improved metric learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8000–8009, 2019.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
- Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International symposium on graph drawing, pages 355–366. Springer, 2012.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
- Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015.
- Casting your model: Learning to localize improves self-supervised representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11058–11067, 2021.
- Time-contrastive networks: Self-supervised learning from video. In 2018 IEEE international conference on robotics and automation (ICRA), pages 1134–1141. IEEE, 2018.
- Max-margin contrastive learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8220–8230, 2022.
- Un-mix: Rethinking image mixtures for unsupervised visual representation learning. arXiv preprint arXiv:2003.05438, 2020.
- Negative data augmentation. In International Conference on Learning Representations, 2021.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems, 29, 2016.
- Deep metric learning via lifted structured feature embedding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136, 2020.
- How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270, 2021.
- Stochastic class-based hard example mining for deep metric learning. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7244–7252, 2019.
- Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
- Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
- Proxynca++: Revisiting and revitalizing proxy neighborhood component analysis. In European Conference on Computer Vision, pages 448–464. Springer, 2020.
- Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
- Contrastive multiview coding. In European conference on computer vision, pages 776–794. Springer, 2020.
- What makes for good views for contrastive learning? Advances in Neural Information Processing Systems, 33:6827–6839, 2020.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
- Self-supervised learning of video-induced visual invariances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13806–13815, 2020.
- Abraham Albert Ungar. Analytic hyperbolic geometry and Albert Einstein’s special theory of relativity. World Scientific, 2008.
- Abraham Albert Ungar. A gyrovector space approach to hyperbolic geometry. Synthesis Lectures on Mathematics and Statistics, 1(1):1–194, 2008.
- Towards domain-agnostic contrastive learning. In International Conference on Machine Learning, pages 10530–10541. PMLR, 2021.
- Understanding the behaviour of contrastive loss. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2021.
- Deep metric learning with angular loss. In Proceedings of the IEEE international conference on computer vision, pages 2593–2601, 2017.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
- Multi-similarity loss with general pair weighting for deep metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5022–5030, 2019.
- Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6388–6397, 2020.
- Distance metric learning for large margin nearest neighbor classification. Journal of machine learning research, 10(2), 2009.
- Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
- Unsupervised discovery of the long-tail in instance segmentation using hierarchical self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2603–2612, 2021.
- Sampling matters in deep embedding learning. In Proceedings of the IEEE international conference on computer vision, pages 2840–2848, 2017.
- Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2840–2848, 2017.
- Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742, 2018.
- Joint detection and identification feature learning for person search. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3415–3424, 2017.
- A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.
- Deep metric learning for person re-identification. In 2014 22nd international conference on pattern recognition, pages 34–39. IEEE, 2014.
- Hyperbolic contrastive learning. arXiv preprint arXiv:2302.01409, 2023.
- Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649, 2018.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Improving the robustness of deep neural networks via stability training. In Proceedings of the ieee conference on computer vision and pattern recognition, pages 4480–4488, 2016.
- Yun Yue (8 papers)
- Fangzhou Lin (10 papers)
- Guanyi Mou (10 papers)
- Ziming Zhang (59 papers)