Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Towards Reliable Dermatology Evaluation Benchmarks (2309.06961v2)

Published 13 Sep 2023 in cs.CV and cs.AI

Abstract: Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Emerging Properties in Self-Supervised Vision Transformers. 2021.
  2. Analysis of the ISIC image datasets: Usage, benchmarks and recommendations. Medical Image Analysis, 2022.
  3. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020.
  4. Learning with Instance-Dependent Label Noise: A Sample Sieve Approach, 2021.
  5. Disparities in dermatology AI performance on a diverse, curated clinical image set. Science Advances, 2022.
  6. SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. 2023.
  7. ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  8. Automated Identification of Label Errors in Large Electrocardiogram Datasets. In 2022 Computing in Cardiology (CinC), 2022.
  9. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, February 2017. ISSN 1476-4687. 10.1038/nature21056. Number: 7639 Publisher: Nature Publishing Group.
  10. Precision-Recall-Gain Curves: PR Analysis Done Right. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  11. Med-node: A computer-assisted melanoma diagnosis system using non-dermoscopic images”. Expert Systems with Applications, 2015.
  12. Matthew Groh. Identifying the context shift between test benchmarks and production data. arXiv preprint arXiv:2207.01059, 2022.
  13. Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the Fitzpatrick 17k Dataset. IEEE Computer Society, 2021.
  14. Towards transparency in dermatology image datasets with skin tone annotations by experts, crowds, and an algorithm. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–26, 2022.
  15. SelfClean: A Self-Supervised Data Cleaning Strategy, 2023.
  16. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison, 2019.
  17. Deep learning with noisy labels: exploring techniques and remedies in medical image analysis, 2020.
  18. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE Journal of Biomedical and Health Informatics, 2019.
  19. Diagnostic accuracy of dermoscopy. The Lancet. Oncology, 3(3):159–165, March 2002. ISSN 1470-2045. 10.1016/s1470-2045(02)00679-4.
  20. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks, April 2021. arXiv:1904.09483 [cs].
  21. PH2 - A dermoscopic image database for research and benchmarking. International Conference of the IEEE Engineering in Medicine and Biology Society, 2013.
  22. Learning with Noisy Labels. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2013.
  23. Confident Learning: Estimating Uncertainty in Dataset Labels. Journal of Artificial Intelligence Research, 2021a.
  24. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. 2021b.
  25. Know your self-supervised learning: A survey on image-based generative and discriminative training. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. Survey Certification.
  26. PAD-UFES-20: A skin lesion dataset composed of patient data and clinical images collected from smartphones. Data in Brief, 2020.
  27. Modeling and mitigating human annotation errors to design efficient stream processing systems with human-in-the-loop machine learning. International Journal of Human-Computer Studies, 2022.
  28. PyTorch: An Imperative Style, High-Performance Deep Learning Library, 2019.
  29. DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. The American Journal of Psychiatry, 2013.
  30. Deep Learning is Robust to Massive Label Noise. 2018.
  31. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–15, Yokohama Japan, May 2021. ACM. ISBN 978-1-4503-8096-6. 10.1145/3411764.3445518.
  32. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  33. A Benchmark for Automatic Visual Classification of Clinical Skin Disease Images. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, volume 9910. Springer International Publishing, Cham, 2016.
  34. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 5(1):180161, August 2018. ISSN 2052-4463. 10.1038/sdata.2018.161. Number: 1 Publisher: Nature Publishing Group.
  35. Domain-specific classification-pretrained fully convolutional network encoders for skin lesion segmentation. Computers in Biology and Medicine, 2019.
  36. Human–computer collaboration for skin cancer recognition. Nature Medicine, 2020.
  37. Label Errors in BANKING77. In Proceedings of the Third Workshop on Insights from Negative Results in NLP. Association for Computational Linguistics, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.