dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans
Abstract: Human annotators typically provide annotated data for training machine learning models, such as neural networks. Yet, human annotations are subject to noise, impairing generalization performances. Methodological research on approaches counteracting noisy annotations requires corresponding datasets for a meaningful empirical evaluation. Consequently, we introduce a novel benchmark dataset, dopanim, consisting of about 15,750 animal images of 15 classes with ground truth labels. For approximately 10,500 of these images, 20 humans provided over 52,000 annotations with an accuracy of circa 67%. Its key attributes include (1) the challenging task of classifying doppelganger animals, (2) human-estimated likelihoods as annotations, and (3) annotator metadata. We benchmark well-known multi-annotator learning approaches using seven variants of this dataset and outline further evaluation use cases such as learning beyond hard class labels and active learning. Our dataset and a comprehensive codebase are publicly available to emulate the data collection process and to reproduce all empirical results.
- Aggnet: Deep Learning from Crowds for Mitosis Detection in Breast Cancer Histology Images. IEEE Trans. Med. Imaging, 35(5):1313–1321, 2016.
- Amazon. Amazon Mechanical Turk. URL https://www.mturk.com/. Accessed 06-05-2024.
- Glenn W Brier. Verification of Forecasts Expressed in Terms of Probability. Mon. Weather Rev., 78(1):1–3, 1950.
- Learning from Crowds with Annotation Reliability. In Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., pages 2103–2107, 2023.
- CERN. Zenodo - Research. Shared. URL https://zenodo.org/. Accessed 06-05-2024.
- Label augmented and weighted majority voting for crowdsourcing. Inf. Sci., 606:397–409, 2022.
- Learning from Crowds by Modeling Common Confusions. In AAAI Conf. Artif. Intell., pages 5832–5840, 2021.
- Eliciting and Learning with Soft Labels from Every Annotator. In AAAI Conf. Hum. Comput. Crowdsourc., volume 10, pages 40–52, 2022. URL https://github.com/cambridge-mlg/cifar-10s. Accessed 06-05-2024.
- Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. J. R. Stat. Soc., 28(1):20–28, 1979.
- The Comparison and Evaluation of Forecasterst. J. R. Stat. Soc.: Series D, 32(1-2):12–22, 1983.
- William Falcon and The PyTorch Lightning Team. PyTorch Lightning, August 2023. URL https://doi.org/10.5281/zenodo.8250019. Accessed 06-05-2024.
- Datasheets for Datasets. Commun. ACM, 64(12):86–92, 2021.
- Deep Sparse Rectifier Neural Networks. In Int. Conf. Artif. Intell. Stat., pages 315–323, 2011.
- Array programming with NumPy. Nature, 585(7825):357–362, September 2020.
- A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification. IEEE Access, 9:166970–166989, 2021.
- Multi-annotator Deep Learning: A Probabilistic Framework for Classification. Trans. Mach. Learn. Res., 2023.
- Annot-Mix: Learning with Noisy Class Labels from Multiple Annotators via a Mixup Extension. arXiv:2405.03386, 2024.
- Two datasets of defect reports labeled by a crowd of annotators of unknown reliability. Data Brief, 18:840–845, 2018.
- Deep Learning From Crowdsourced Labels: Coupled Cross-Entropy Minimization, Identifiability, and Regularization. In Int. Conf. Learn. Represent., 2023.
- iNaturalist Network. iNaturalist. URL https://www.inaturalist.org. Accessed 06-05-2024.
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Int. Conf. Mach. Learn., pages 448–456, 2015.
- Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell., 43(11):4037–4058, 2020.
- Jordan Cook. pyinaturalist: A Python Client for iNaturalist. URL https://github.com/pyinat/pyinaturalist. Accessed 06-05-2024.
- Learning From Noisy Singly-labeled Data. In Int. Conf. Learn. Represent., 2018.
- scikit-activeml: A library and toolbox for active learning algorithms. Preprints, 2021.
- Verified Uncertainty Calibration. In Adv. Neural Inf. Process. Syst., 2019.
- CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise. In Conf. Comput. Vis. Pattern Recognit., pages 5447–5456, 2018.
- Beyond confusion matrix: learning from multiple annotators with awareness of instance features. Mach. Learn., pages 1–23, 2022.
- On the Variance of the Adaptive Learning Rate and Beyond. In Int. Conf. Learn. Represent., 2019.
- SGDR: Stochastic Gradient Descent with Warm Restarts. In Int. Conf. Learn. Represent., 2017.
- DINOv2: Learning Robust Visual Features without Supervision. Trans. Mach. Learn. Res., 2023.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Adv. Neural Inf. Process. Syst., 2019.
- Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12:2825–2830, 2011.
- Human Uncertainty Makes Classification More Robust. In IEEE/CVF Int. Conf. Comput. Vis., pages 9617–9626, 2019. URL https://github.com/jcpeterson/cifar-10h. Accessed 06-05-2024.
- Prolific. Prolific. URL https://www.prolific.com/. Accessed 06-05-2024.
- Learning from Crowds. J. Mach. Learn. Res., 11(4):1297–1322, 2010.
- Regulation on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (Data Protection Directive). General Data Protection Regulation (GDPR). URL https://eur-lex.europa.eu/eli/reg/2016/679/oj. Accessed 06-05-2024.
- A Survey of Deep Active Learning. ACM Comput. Surv., 54(9):1–40, 2021.
- Deep Learning from Crowds. In AAAI Conf. Artif. Intell., pages 1611–1618, 2018. URL http://fprodrigues.com//deep_LabelMe.tar.gz.
- Learning from multiple annotators: Distinguishing good from random labelers. Pattern Recognit. Lett., 34(12):1428–1436, 2013. URL http://fprodrigues.com//mturk-datasets.tar.gz. Accessed 06-05-2024.
- End-to-End Weak Supervision. In Adv. Neural Inf. Process. Syst., 2021.
- Ethical norms and issues in crowdsourcing practices: A Habermasian analysis. Inf. Syst. J., 29(4):811–837, 2019.
- Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation. In Adv. Neural Inf. Process. Syst., 2022.
- B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin – Madison, 2009.
- Selfie: Refurbishing Unclean Samples for Robust Deep Learning. In Int. Conf. Mach. Learn., pages 5907–5915, 2019.
- Learning From Noisy Labels With Deep Neural Networks: A Survey. IEEE Trans. Neural Netw. Learn. Syst., 2022.
- C. Spearman. The Proof and Measurement of Association between Two Things. Am. J. Psychol., 15(1):72–101, 1904.
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res., 15(1):1929–1958, 2014.
- Learning from Noisy Labels by Regularized Estimation of Annotator Confusion. In Conf. Comput. Vis. Pattern Recognit., pages 11244–11253, 2019.
- Max-Margin Majority Voting for Learning from Crowds. In Adv. Neural Inf. Process. Syst., 2015.
- Label Studio: Data labeling software. URL https://github.com/heartexlabs/label-studio. Accessed 06-05-2024.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing Data Using t𝑡titalic_t-SNE. J. Mach. Learn. Res., 9(11), 2008.
- Jennifer W. Vaughan. Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research. J. Mach. Learn. Res., 18(193):1–46, 2018.
- Deep Learning From Multiple Noisy Annotators as A Union. IEEE Trans. Neural Netw. Learn. Syst., 2022.
- Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations. In Int. Conf. Learn. Represent., 2021.
- Wes McKinney. Data Structures for Statistical Computing in Python. In Python in Science Conference, pages 56–61, 2010.
- Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Adv. Neural Inf. Process. Syst., 2009.
- Privacy in Crowdsourcing: a Review of the Threats and Challenges. Comput. Support. Coop. Work, 29:263–301, 2020.
- Learning from Massive Noisy Labeled Data for Image Classification. In Conf. Comput. Vis. Pattern Recognit., pages 2691–2699, 2015.
- Omry Yadan. Hydra – a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra. Accessed 06-05-2024.
- Learning from multiple annotators with varying expertise. Mach. Learn., 95(3):291–327, 2014.
- Learning from Multiple Noisy Partial Labelers. In Int. Conf. Artif. Intell. Stat., pages 11072–11095, 2022.
- Early Stopping Against Label Noise Without Validation Data. In Int. Conf. Learn. Represent., 2024.
- Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Eng. Bull., 41(4):39–45, 2018.
- mixup: Beyond Empirical Risk Minimization. In Int. Conf. Learn. Represent., 2018.
- Learning from Crowdsourced Labeled Data: A Survey. Artif. Intell. Rev., 46(4):543–576, 2016.
- Disentangling Human Error from Ground Truth in Segmentation of Medical Images. In Adv. Neural Inf. Process. Syst., pages 15750–15762, 2020.
- Learning from Multiple Annotators for Medical Image Segmentation. Pattern Recognit., page 109400, 2023.
- Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. In Adv. Neural Inf. Process. Syst., 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.