Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation (2306.12189v2)

Published 21 Jun 2023 in cs.CV

Abstract: In the field of image classification, existing methods often struggle with biased or ambiguous data, a prevalent issue in real-world scenarios. Current strategies, including semi-supervised learning and class blending, offer partial solutions but lack a definitive resolution. Addressing this gap, our paper introduces a novel strategy for generating high-quality labels in challenging datasets. Central to our approach is a clearly designed flowchart, based on a broad literature review, which enables the creation of reliable labels. We validate our methodology through a rigorous real-world test case in the biomedical field, specifically in deducing height reduction from vertebral imaging. Our empirical study, leveraging over 250,000 annotations, demonstrates the effectiveness of our strategies decisions compared to their alternatives.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. We Need to Consider Disagreement in Evaluation. In Proceedings of the 1st workshop on benchmarking: past, present and future, pages 15–21, 2021.
  2. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems, pages 5050–5060, 2019.
  3. Are we done with ImageNet? arXiv preprint arXiv:2006.07159, 2020.
  4. ‘Tailception’: using neural networks for assessing tail lesions on pictures of pig carcasses. Animal, 13(5):1030–1036, 2019. ISSN 17517311. doi: 10.1017/S1751731118003038.
  5. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. Conference on Human Factors in Computing Systems - Proceedings, 2017-May:2334–2346, 2017. doi: 10.1145/3025453.3026044.
  6. Big Self-Supervised Models are Strong Semi-Supervised Learners. Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020), 2020.
  7. Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations. Transactions of the Association for Computational Linguistics, 10:92–110, 2022. ISSN 2307387X.
  8. Semi-automated data labeling. In NeurIPS 2020 Competition and Demonstration Track, pages 156–169. PMLR, 2021.
  9. Comparison of image annotation data generated by multiple investigators for benthic ecology. Marine Ecology Progress Series, 552:61–70, 2016. ISSN 01718630. doi: 10.3354/meps11775.
  10. Comparison of semiquantitative visual and quantitative morphometric assessment of prevalent and incident vertebral fractures in osteoporosis. Journal of Bone and Mineral Research, 11(7):984–996, 1996. ISSN 08840431. doi: 10.1002/jbmr.5650110716.
  11. OmniMAE: Single Model Masked Pretraining on Images and Videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10406–10417, 2023.
  12. Beyond Hard Labels: Investigating data label distributions. ICML 2022 Workshop DataPerf: Benchmarking Data for Data-Centric AI, 2022.
  13. J. Haczynski and A. Jakimiuk. Vertebral fractures: a hidden problem of osteoporosis. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research, 7(5):1108–1117, 2001.
  14. Identity Mappings in Deep Residual Networks. In Computer Vision – ECCV 2016, pages 630–645, 2016.
  15. Masked Autoencoders Are Scalable Vision Learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  16. A practical guide to structured expert elicitation using the IDEA protocol. Methods in Ecology and Evolution, 9(1):169–180, 2018. ISSN 2041210X. doi: 10.1111/2041-210X.12857.
  17. When and why defaults influence decisions: A meta-analysis of default effects. Behavioural Public Policy, 3(2):159–186, 2019.
  18. The Principles of Data-Centric AI (DCAI). pages 1–14, 2022.
  19. Understanding international perceptions of the severity of harmful content online. PLOS ONE, 16(8):e0256762, 2021. ISSN 1932-6203. doi: 10.1371/journal.pone.0256762.
  20. Deep Learning-Based Gleason Grading of Prostate Cancer From Histopathology Images—Role of Multiscale Decision Aggregation and Data Augmentation. IEEE Journal of Biomedical and Health Informatics, 24(5):1413–1426, 2020. doi: 10.1109/JBHI.2019.2944643.
  21. Temperate fish detection and classification: a deep learning based approach. Applied Intelligence, 2021. ISSN 15737497. doi: 10.1007/s10489-020-02154-9.
  22. A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  23. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, volume 60, pages 1097–1105. Association for Computing Machinery, 2012. doi: 10.1145/3065386.
  24. U. Krothapalli and A. L. Abbott. Adaptive label smoothing. arXiv preprint arXiv:2009.06432, 2020.
  25. S. Kullback and R. A. Leibler. On Information and Sufficiency. Ann. Math. Statist., 22(1):79–86, 1951. doi: 10.1214/aoms/1177729694.
  26. S. Laine and T. Aila. Temporal ensembling for semi-supervised learning. In International Conference on Learning Representations, 2017.
  27. D.-H. Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, page 2, 2013.
  28. DivideMix: Learning with Noisy Labels as Semi-supervised Learning. In International Conference on Learning Representations, pages 1–14, 2020.
  29. Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets. CVPR, pages 4350–4359, 2021.
  30. Learning Customized Visual Models with Retrieval-Augmented Knowledge. arXiv preprint arXiv:2301.07094, 2023.
  31. Isometric Propagation Network for Generalized Zero-shot Learning. International Conference on Learning Representations, 2021a.
  32. AutoDC: Automated data-centric processing. (NeurIPS):1–6, 2021b.
  33. A vertebral segmentation dataset with fracture grading. Radiology: Artificial Intelligence, 2(4):1–6, 2020. ISSN 26386100. doi: 10.1148/ryai.2020190138.
  34. Does label smoothing mitigate label noise? In International Conference on Machine Learning, pages 6448—-6458. PMLR, 2020.
  35. O. R. Lyman. An Introduction to Statistical Methods and Data Analysis. 1993.
  36. Scaling up instance annotation via label propagation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15364–15373, 2021.
  37. K. Park and H. Chung. Uncertainty Guided Pseudo-Labeling: Estimating Uncertainty on Ambiguous Data for Escalating Image Recognition Performance. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence, volume 2, pages 541–551. SCITEPRESS - Science and Technology Publications, 2022. doi: 10.5220/0010901600003116.
  38. Human uncertainty makes classification more robust. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob:9616–9625, 2019. ISSN 15505499. doi: 10.1109/ICCV.2019.00971.
  39. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  40. A survey of image labelling for computer vision applications. Journal of Business Analytics, 4(2):91–110, 2021. ISSN 25732358. doi: 10.1080/2573234X.2021.1908861.
  41. A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Scientific Reports, 10(1):1–10, 2020. ISSN 20452322. doi: 10.1038/s41598-020-71639-x.
  42. Towards Reducing Aleatoric Uncertainty for Medical Imaging Tasks. In IEEE 19th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2022.
  43. 2D and 3D Segmentation of uncertain local collagen fiber orientations in SHG microscopy. In DAGM German Conference of Pattern Regocnition, volume 11824 LNCS, pages 374–386. Springer, 2019.
  44. Fuzzy Overclustering: Semi-supervised classification of fuzzy labels with overclustering and inverse cross-entropy. Sensors, 21(19):6661, 2021a. ISSN 23318422. doi: 10.3390/s21196661.
  45. A Data-Centric Image Classification Benchmark. NeurIPS 2021 Data-centric AI workshop, 2021b.
  46. Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation. Advances in Neural Information Processing Systems, 35:33215—-33232, 2022a.
  47. A data-centric approach for improving ambiguous labels with combined semi-supervised classification and clustering. Proceedings of the European Conference on Computer Vision (ECCV), 2022b.
  48. Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality. arXiv preprint arXiv:2305.12811, 2023.
  49. MorphoCluster: Efficient Annotation of Plankton images by Clustering. Sensors, 20, 2020.
  50. VERSE: A Vertebrae labelling and segmentation benchmark for multi-detector CT images. Medical Image Analysis, 73, 2021. ISSN 13618423. doi: 10.1016/j.media.2021.102166.
  51. Ambiguity Helps: Classification with Disagreements in Crowdsourced Annotations. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem:2194–2202, 2016. ISSN 10636919. doi: 10.1109/CVPR.2016.241.
  52. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020), 2020.
  53. Deep learning with self-supervision and uncertainty regularization to count fish in underwater images. Plos one, 17(5):1–22, 2021.
  54. A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In ICLR, 2017.
  55. Learning from Disagreement: A Survey. Journal of Artificial Intelligence Research, 72:1385–1470, 2021. ISSN 1076-9757. doi: 10.1613/jair.1.12752.
  56. When does dough become a bagel? Analyzing the remaining mistakes on ImageNet. Advances in Neural Information Processing Systems, 35:6720–6734, 2022.
  57. Attention Is All You Need. Advances in neural information processing systems, 30, 2017.
  58. Learn to train: Improving training data for a neural network to detect pecking injuries in turkeys. Animals 2021, 11:1–13, 2021. doi: 10.3390/ani11092655.
  59. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–37, 2019.
  60. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys (CSUR), 53(3):1–34, 2020.
  61. Faint Features Tell: Automatic Vertebrae Fracture Screening Assisted by Contrastive Learning. 2022.
  62. Data collection and quality challenges in deep learning: a data-centric AI perspective. VLDB Journal, 2023. ISSN 0949877X. doi: 10.1007/s00778-022-00775-9.
  63. Re-Labeling ImageNet: From Single to Multi-Labels, From Global to Localized Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2340–2350, 2021.
  64. A Soft Label Method for Medical Image Segmentation with Multirater Annotations. Computational Intelligence and Neuroscience, 2023:1–11, 2023. ISSN 1687-5265. doi: 10.1155/2023/1883597.
Citations (1)

Summary

We haven't generated a summary for this paper yet.