Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning (2404.19282v1)
Abstract: Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs. It is a vital parameter to reduce the redundant sample pairs participating in training. Nonetheless, finding the optimal threshold can be a time-consuming endeavor, often requiring extensive grid searches. Because the threshold cannot be dynamically adjusted in the training stage, we should conduct plenty of repeated experiments to determine the threshold. Therefore, we introduce a novel approach for adjusting the thresholds associated with both the loss function and the sample mining strategy. We design a static Asymmetric Sample Mining Strategy (ASMS) and its dynamic version Adaptive Tolerance ASMS (AT-ASMS), tailored for sample mining methods. ASMS utilizes differentiated thresholds to address the problems (too few positive pairs and too many redundant negative pairs) caused by only applying a single threshold to filter samples. AT-ASMS can adaptively regulate the ratio of positive and negative pairs during training according to the ratio of the currently mined positive and negative pairs. This meta-learning-based threshold generation algorithm utilizes a single-step gradient descent to obtain new thresholds. We combine these two threshold adjustment algorithms to form the Dual Dynamic Threshold Adjustment Strategy (DDTAS). Experimental results show that our algorithm achieves competitive performance on CUB200, Cars196, and SOP datasets.
- W. Jiang, K. Huang, J. Geng, and X. Deng, “Multi-scale metric learning for few-shot learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1091–1102, 2020.
- S. C. Hoi, W. Liu, and S.-F. Chang, “Semi-supervised distance metric learning for collaborative image retrieval and clustering,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 6, no. 3, pp. 1–26, 2010.
- T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
- Y. Yao, J. Zhang, F. Shen, X. Hua, J. Xu, and Z. Tang, “Exploiting web images for dataset construction: A domain robust approach,” IEEE Transactions on Multimedia., vol. 19, no. 8, pp. 1771–1784, 2017.
- X. Jiang, S. Liu, X. Dai, G. Hu, X. Huang, Y. Yao, G.-S. Xie, and L. Shao, “Deep metric learning based on meta-mining strategy with semiglobal information,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5103–5116, 2024.
- Y. Yao, T. Chen, H. Bi, X. Cai, G. Pei, G. Yang, Z. Yan, X. Sun, X. Xu, and H. Zhang, “Automated object recognition in high-resolution optical remote sensing imagery,” National Science Review, vol. 10, no. 6, p. nwad122, 2023.
- Y. Tang, T. Chen, X. Jiang, Y. Yao, G.-S. Xie, and H.-T. Shen, “Holistic prototype attention network for few-shot video object segmentation,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- Y. Yao, T. Chen, G.-S. Xie, C. Zhang, F. Shen, Q. Wu, Z. Tang, and J. Zhang, “Non-salient region object mining for weakly supervised semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2623–2632.
- G. Pei, F. Shen, Y. Yao, G.-S. Xie, Z. Tang, and J. Tang, “Hierarchical feature alignment network for unsupervised video object segmentation,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 596–613.
- Y. Yao, Z. Sun, C. Zhang, F. Shen, Q. Wu, J. Zhang, and Z. Tang, “Jo-src: A contrastive approach for combating noisy labels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5192–5201.
- Z. Sun, Y. Yao, X.-S. Wei, Y. Zhang, F. Shen, J. Wu, J. Zhang, and H. T. Shen, “Webly supervised fine-grained recognition: Benchmark datasets and an approach,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 602–10 611.
- D. Liu, L. Wu, R. Hong, Z. Ge, J. Shen, F. Boussaid, and M. Bennamoun, “Generative metric learning for adversarially robust open-world person re-identification,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 1, pp. 1–19, 2023.
- D. Zhang, W. Wu, H. Cheng, R. Zhang, Z. Dong, and Z. Cai, “Image-to-video person re-identification with temporally memorized similarity learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2622–2632, 2017.
- X. Yang, P. Zhou, and M. Wang, “Person reidentification via structural deep metric learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 10, pp. 2987–2998, 2018.
- G. Zhong, Y. Zheng, S. Li, and Y. Fu, “Slmoml: online metric learning with global convergence,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2460–2472, 2017.
- J. Hu, J. Lu, and Y.-P. Tan, “Discriminative deep metric learning for face verification in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1875–1882.
- H. Yao, S. Zhang, R. Hong, Y. Zhang, C. Xu, and Q. Tian, “Deep representation learning with part loss for person re-identification,” IEEE Transactions on Image Processing., vol. 28, no. 6, pp. 2860–2871, 2019.
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208.
- J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems, 2017, pp. 4077–4087.
- Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh, “No fuss distance metric learning using proxies,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 360–368.
- S. Kim, D. Kim, M. Cho, and S. Kwak, “Proxy anchor loss for deep metric learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3238–3247.
- R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 1735–1742.
- F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 815–823.
- X. He, Y. Zhou, Z. Zhou, S. Bai, and X. Bai, “Triplet-center loss for multi-view 3d object retrieval,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1945–1954.
- W. Kim, B. Goyal, K. Chawla, J. Lee, and K. Kwon, “Attention-based ensemble for deep metric learning,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 736–751.
- X. Wang, X. Han, W. Huang, D. Dong, and M. R. Scott, “Multi-similarity loss with general pair weighting for deep metric learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5022–5030.
- H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4004–4012.
- C.-Y. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl, “Sampling matters in deep embedding learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2840–2848.
- Y. Dong, L. Zhen, and S. Z. Li, “Deep metric learning for practical person re-identification,” Computer Science, pp. 34–39, 2014.
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” Technical Report CNS-TR-2010-001, 2011.
- J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 554–561.
- J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
- J. Mao, Y. Yao, Z. Sun, X. Huang, F. Shen, and H.-T. Shen, “Attention map guided transformer pruning for occluded person re-identification on edge device,” IEEE Transactions on Multimedia., vol. 25, pp. 1592–1599, 2023.
- J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Säckinger, and R. Shah, “Signature verification using a “siamese” time delay neural network,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, no. 4, pp. 669–688, 1993.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, 2015, pp. 448–456.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- J. Ni, J. Liu, C. Zhang, D. Ye, and Z. Ma, “Fine-grained patient similarity measuring using deep metric learning,” in Proceedings of the ACM on Conference on Information and Knowledge Management, 2017, pp. 1189–1198.
- O. Rippel, M. Paluri, P. Dollar, and L. Bourdev, “Metric learning with adaptive density discrimination,” arXiv preprint arXiv:1511.05939, 2015.
- Z. Sun, F. Shen, D. Huang, Q. Wang, X. Shu, Y. Yao, and J. Tang, “Pnp: Robust learning from noisy labels by probabilistic noise prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5311–5320.
- M. Sheng, Z. Sun, Z. Cai, T. Chen, Y. Zhou, and Y. Yao, “Adaptive integration of partial label learning and negative learning for enhanced noisy label learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 4820–4828.
- G. Pei, Y. Yao, F. Shen, D. Huang, X. Huang, and H.-T. Shen, “Hierarchical co-attention propagation network for zero-shot video object segmentation,” IEEE Transactions on Image Processing., vol. 32, pp. 2348–2359, 2023.
- T. Chen, Y. Yao, and J. Tang, “Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation,” IEEE Transactions on Image Processing., vol. 32, pp. 2960–2971, 2023.
- M. Opitz, G. Waltner, H. Possegger, and H. Bischof, “Bier-boosting independent embeddings robustly,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5189–5198.
- Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997.
- B. Harwood, V. Kumar BG, G. Carneiro, I. Reid, and T. Drummond, “Smart mining for deep metric learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2821–2829.
- G. Pei, F. Shen, Y. Yao, T. Chen, X.-S. Hua, and H.-T. Shen, “Hierarchical graph pattern understanding for zero-shot video object segmentation,” IEEE Transactions on Image Processing., vol. 32, pp. 5909–5920, 2023.
- K. Roth, T. Milbich, and B. Ommer, “Pads: Policy-adapted sampling for visual similarity learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 6568–6577.
- V. K. Verma, D. Brahma, and P. Rai, “Meta-learning for generalized zero-shot learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 6062–6069.
- O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, 2016, pp. 3630–3638.
- A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in International Conference on Machine Learning, 2016, pp. 1842–1850.
- T. Munkhdalai and H. Yu, “Meta networks,” in International Conference on Machine Learning, 2017, pp. 2554–2563.
- S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in International Conference on Learning Representation, 2017.
- A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International Conference on Machine Learning, 2017, pp. 1126–1135.
- Y. Yao, J. Zhang, F. Shen, L. Liu, F. Zhu, D. Zhang, and H. T. Shen, “Towards automatic construction of diverse, high-quality image datasets,” vol. 32, no. 6, pp. 1199–1211, 2020.
- S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 539–546.
- K. Q. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification.” Journal of Machine Learning Research, vol. 10, no. 2, 2009.
- M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in International Conference on Machine Learning, 2018, pp. 4334–4343.
- K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” in Advances in Neural Information Processing Systems, 2016, pp. 1857–1865.
- B. Vasudeva, P. Deora, S. Bhattacharya, U. Pal, and S. Chanda, “Loop: Looking for optimal hard negative embeddings for deep metric learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 634–10 643.
- Y. Yuan, K. Yang, and C. Zhang, “Hard-aware deeply cascaded embedding,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 814–823.
- M. Opitz, G. Waltner, H. Possegger, and H. Bischof, “Deep metric learning with bier: Boosting independent embeddings robustly,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 276–290, 2018.
- X. Wang, H. Zhang, W. Huang, and M. R. Scott, “Cross-batch memory for embedding learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 6388–6397.
- H. Oh Song, S. Jegelka, V. Rathod, and K. Murphy, “Deep metric learning via facility location,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5382–5390.
- W. Zheng, J. Lu, and J. Zhou, “Deep metric learning via adaptive learnable assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2960–2969.
- C.-H. Liang, W.-L. Zhao, and R.-Q. Chen, “Dynamic sampling for deep metric learning,” Pattern Recognition Letters, vol. 150, pp. 49–56, 2021.
- Q. Qian, L. Shang, B. Sun, J. Hu, H. Li, and R. Jin, “Softtriple loss: Deep metric learning without triplet sampling,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6450–6458.
- D. H. Kim and B. C. Song, “Virtual sample-based deep metric learning using discriminant analysis,” Pattern Recognition, vol. 110, p. 107643, 2021.
- Y. Sun, C. Cheng, Y. Zhang, C. Zhang, L. Zheng, Z. Wang, and Y. Wei, “Circle loss: A unified perspective of pair similarity optimization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 6398–6407.
- J. Wang, Z. Zhang, D. Huang, W. Song, Q. Wei, and X. Li, “A ranked similarity loss function with pair weighting for deep metric learning,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2021, pp. 1760–1764.
- Xiruo Jiang (7 papers)
- Yazhou Yao (52 papers)
- Sheng Liu (122 papers)
- Fumin Shen (50 papers)
- Liqiang Nie (191 papers)
- Xiansheng Hua (26 papers)