Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Handling Realistic Label Noise in BERT Text Classification (2305.16337v2)

Published 23 May 2023 in cs.CL and cs.AI

Abstract: Labels noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers. Several methods have been proposed to counteract the effect of random label noise in supervised classification, and some studies have shown that BERT is already robust against high rates of randomly injected label noise. However, real label noise is not random; rather, it is often correlated with input features or other annotator-specific factors. In this paper, we evaluate BERT in the presence of two types of realistic label noise: feature-dependent label noise, and synthetic label noise from annotator disagreements. We show that the presence of these types of noise significantly degrades BERT classification performance. To improve robustness, we evaluate different types of ensembles and noise-cleaning methods and compare their effectiveness against label noise across different datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Görkem Algan and İlkay Ulusoy. 2020. Label noise types and their effects on deep learning.
  2. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), page unknown. Association for Computational Linguistics.
  3. Detecting label errors by using pre-trained language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9074–9091. Association for Computational Linguistics.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805.
  5. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151.
  6. Part-of-speech tagging for twitter: Annotation, features, and experiments. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA - Short Papers, pages 42–47. The Association for Computer Linguistics.
  7. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NeurIPS, pages 8535–8545.
  8. Transfer learning and distant supervision for multilingual transformer models: A study on African languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2580–2591. Association for Computational Linguistics.
  9. How to handle noisy labels for robust learning from uncertainty. Neural Networks, 143:209–217.
  10. How to handle noisy labels for robust learning from uncertainty. Neural Networks, pages 209–217.
  11. ARNOR: Attention regularization based noise reduction for distant supervision relation classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1399–1408, Florence, Italy. Association for Computational Linguistics.
  12. Beyond synthetic noise: Deep learning on controlled noisy labels. JMLR.org.
  13. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML, page unknown.
  14. An effective label noise model for DNN text classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3246–3256, Minneapolis, Minnesota. Association for Computational Linguistics.
  15. Leonid V Kantorovich. 2006. On the translocation of masses. Journal of mathematical sciences, 133(4):1381–1382.
  16. Learning from noisy labels with distillation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 1928–1936. IEEE Computer Society.
  17. CETA: A consensus enhanced training approach for denoising in distantly supervised relation extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2247–2258, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  18. Blind image quality assessment by learning from multiple annotators. 2019 IEEE International Conference on Image Processing (ICIP), pages 2344–2348.
  19. SENT: Sentence-level distant relation extraction via negative training. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6201–6213, Online. Association for Computational Linguistics.
  20. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011, Suntec, Singapore. Association for Computational Linguistics.
  21. Pervasive label errors in test sets destabilize machine learning benchmarks. CoRR, abs/2103.14749.
  22. DSGAN: Generative adversarial training for distant supervision relation extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 496–505, Melbourne, Australia. Association for Computational Linguistics.
  23. Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Mining and Knowledge Discovery, pages 265–303.
  24. Training Convolutional Networks with Noisy Labels. arXiv e-prints, page arXiv:1406.2080.
  25. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826.
  26. Training robust deep neural networks on noisy labels using adaptive sample selection with disagreement. IEEE Access, pages 141131–141143.
  27. Virginia Wheway. 2001. Using boosting to detect noisy data. In Advances in Artificial Intelligence. PRICAI 2000 Workshop Reader, pages 123–130. Springer Berlin Heidelberg.
  28. Learning from multiple annotators with varying expertise. pages 291––327.
  29. Learning with different amounts of annotation: From zero to many labels. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7620–7632.
  30. Zhilu Zhang and Mert R. Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 8792–8802, Red Hook, NY, USA. Curran Associates Inc.
  31. Is bert robust to label noise? a study on learning with noisy labels in text classification. In Proceedings of the Third Workshop on Insights from Negative Results in NLP (Insights at acl), 2022, Dublin, Ireland, May 26, pages 62–67. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Maha Tufail Agro (3 papers)
  2. Hanan Aldarmaki (29 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.