Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English (2405.11865v1)

Published 20 May 2024 in cs.CL and cs.AI

Abstract: Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models. However, over the past several years, the state-of-the-art has seemingly hit another plateau on the benchmark CoNLL-03 English dataset. In this paper, we perform a deep dive into the test outputs of the highest-performing NER models, conducting a fine-grained evaluation of their performance by introducing new document-level annotations on the test set. We go beyond F1 scores by categorizing errors in order to interpret the true state of the art for NER and guide future work. We review previous attempts at correcting the various flaws of the test set and introduce CoNLL#, a new corrected version of the test set that addresses its systematic and most prevalent errors, allowing for low-noise, interpretable error analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Rodrigo Agerri and German Rigau. 2016. Robust multilingual named entity recognition with shallow semi-supervised features. Artif. Intell., 238:63–82.
  2. Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604.
  3. Nancy Chinchor and Beth Sundheim. 1993. MUC-5 evaluation metrics. In Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993.
  4. Domain adaptation of rule-based annotators for named-entity recognition tasks. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1002–1012, Cambridge, MA. Association for Computational Linguistics.
  5. Towards efficient named-entity rule induction for customizability. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 128–138, Jeju Island, Korea. Association for Computational Linguistics.
  6. SeqScore: Addressing barriers to reproducible named entity recognition evaluation. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 40–50, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  7. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, 2(11).
  8. AI and the everything in the whole wide world benchmark. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran.
  9. Susanna Rücker and Alan Akbik. 2023. CleanCoNLL: A nearly noise-free named entity recognition dataset. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8628–8645, Singapore. Association for Computational Linguistics.
  10. NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10776–10787, Singapore. Association for Computational Linguistics.
  11. Stefan Schweter and Alan Akbik. 2020. FLERT: Document-level features for named entity recognition. arXiv preprint arXiv:2011.06993.
  12. Named entity recognition - is there a glass ceiling? In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 624–633, Hong Kong, China. Association for Computational Linguistics.
  13. LUKE: Deep contextualized entity representations with entity-aware self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6442–6454, Online. Association for Computational Linguistics.
  14. Rethinking generalization of neural models: A named entity recognition case study. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):7732–7739.
  15. Autoregressive structured prediction with language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 993–1005, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  16. Identifying incorrect labels in the CoNLL-2003 corpus. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 215–226, Online. Association for Computational Linguistics.
  17. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147.
  18. CrossWeigh: Training named entity tagger from imperfect annotations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5154–5163, Hong Kong, China. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Andrew Rueda (1 paper)
  2. Elena Álvarez Mellado (3 papers)
  3. Constantine Lignos (19 papers)

Summary

We haven't generated a summary for this paper yet.