Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images (2312.07273v2)

Published 12 Dec 2023 in cs.CV

Abstract: Near- and duplicate image detection is a critical concern in the field of medical imaging. Medical datasets often contain similar or duplicate images from various sources, which can lead to significant performance issues and evaluation biases, especially in machine learning tasks due to data leakage between training and testing subsets. In this paper, we present an approach for identifying near- and duplicate 3D medical images leveraging publicly available 2D computer vision embeddings. We assessed our approach by comparing embeddings extracted from two state-of-the-art self-supervised pretrained models and two different vector index structures for similarity retrieval. We generate an experimental benchmark based on the publicly available Medical Segmentation Decathlon dataset. The proposed method yields promising results for near- and duplicate image detection achieving a mean sensitivity and specificity of 0.9645 and 0.8559, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  2. H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9.   Springer, 2006, pp. 404–417.
  3. Z. Zhou, K. Lin, Y. Cao, C.-N. Yang, and Y. Liu, “Near-Duplicate Image Detection System Using Coarse-to-Fine Matching Scheme Based on Global and Local CNN Features,” Mathematics, vol. 8, no. 4, p. 644, Apr. 2020, number: 4 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/2227-7390/8/4/644
  4. L. Morra and F. Lamberti, “Benchmarking unsupervised near-duplicate image detection,” Expert Systems with Applications, vol. 135, pp. 313–326, Nov. 2019, arXiv:1907.02821 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1907.02821
  5. T. Koker, S. Chintapalli, S. Wang, B. Talbot, D. Wainstock, M. Cicconet, and M. Walsh, “On Identification and Retrieval of Near-Duplicate Biological Images: a New Dataset and Protocol,” in 2020 25th International Conference on Pattern Recognition (ICPR).   Milan, Italy: IEEE, Jan. 2021, pp. 3114–3121. [Online]. Available: https://ieeexplore.ieee.org/document/9412849/
  6. Z. Zhou, Q. M. J. Wu, S. Wan, W. Sun, and X. Sun, “Integrating SIFT and CNN Feature Matching for Partial-Duplicate Image Detection,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 4, no. 5, pp. 593–604, Oct. 2020. [Online]. Available: https://ieeexplore.ieee.org/document/9121754/
  7. B. Barz and J. Denzler, “Do We Train on Test Data? Purging CIFAR of Near-Duplicates,” Journal of Imaging, vol. 6, no. 6, p. 41, Jun. 2020, number: 6 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/2313-433X/6/6/41
  8. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  9. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  10. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660.
  11. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
  12. T. Truong, S. Mohammadi, and M. Lenga, “How transferable are self-supervised features in medical image classification tasks?” in Machine Learning for Health.   PMLR, 2021, pp. 54–74.
  13. M. Antonelli, A. Reinke, S. Bakas, K. Farahani, AnnetteKopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, B. van Ginneken, M. Bilello, P. Bilic, P. F. Christ, R. K. G. Do, M. J. Gollub, S. H. Heckers, H. Huisman, W. R. Jarnagin, M. K. McHugo, S. Napel, J. S. G. Pernicka, K. Rhode, C. Tobon-Gomez, E. Vorontsov, H. Huisman, J. A. Meakin, S. Ourselin, M. Wiesenfarth, P. Arbelaez, B. Bae, S. Chen, L. Daza, J. Feng, B. He, F. Isensee, Y. Ji, F. Jia, N. Kim, I. Kim, D. Merhof, A. Pai, B. Park, M. Perslev, R. Rezaiifar, O. Rippel, I. Sarasua, W. Shen, J. Son, C. Wachinger, L. Wang, Y. Wang, Y. Xia, D. Xu, Z. Xu, Y. Zheng, A. L. Simpson, L. Maier-Hein, and M. J. Cardoso, “The Medical Segmentation Decathlon,” arXiv:2106.05735 [cs, eess], Jun. 2021, arXiv: 2106.05735. [Online]. Available: http://arxiv.org/abs/2106.05735
  14. P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
  15. M. S. Charikar, “Similarity estimation techniques from rounding algorithms,” in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 2002, pp. 380–388.
  16. Y. A. Malkov and D. A. Yashunin, “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 4, pp. 824–836, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tuan Truong (22 papers)
  2. Farnaz Khun Jush (5 papers)
  3. Matthias Lenga (16 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.