Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges (2309.14084v1)

Published 25 Sep 2023 in cs.CL, cs.AI, and cs.IR

Abstract: In the domain of NLP, Named Entity Recognition (NER) stands out as a pivotal mechanism for extracting structured insights from unstructured text. This manuscript offers an exhaustive exploration into the evolving landscape of NER methodologies, blending foundational principles with contemporary AI advancements. Beginning with the rudimentary concepts of NER, the study spans a spectrum of techniques from traditional rule-based strategies to the contemporary marvels of transformer architectures, particularly highlighting integrations such as BERT with LSTM and CNN. The narrative accentuates domain-specific NER models, tailored for intricate areas like finance, legal, and healthcare, emphasizing their specialized adaptability. Additionally, the research delves into cutting-edge paradigms including reinforcement learning, innovative constructs like E-NER, and the interplay of Optical Character Recognition (OCR) in augmenting NER capabilities. Grounding its insights in practical realms, the paper sheds light on the indispensable role of NER in sectors like finance and biomedicine, addressing the unique challenges they present. The conclusion outlines open challenges and avenues, marking this work as a comprehensive guide for those delving into NER research and applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  2. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1):50–70, 2020.
  3. Girish Keshav Palshikar. Techniques for named entity recognition: a survey. In Bioinformatics: Concepts, Methodologies, Tools, and Applications, pages 400–426. IGI Global, 2013.
  4. Named entity recognition with bidirectional lstm-cnns. Transactions of the association for computational linguistics, 4:357–370, 2016.
  5. Vibertgrid: a jointly trained multi-modal 2d document representation for key information extraction from documents. In Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I 16, pages 548–563. Springer, 2021.
  6. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
  7. Combination of natural language understanding and reinforcement learning for booking bot. Journal of Electrical, Electronic, Information, and Communication Technology, 3(1):12–17, 2021.
  8. Gaussian prior reinforcement learning for nested named entity recognition. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  9. Distantly supervised named entity recognition with category-oriented confidence calibration. In International Conference on Asian Digital Libraries, pages 46–55. Springer, 2022.
  10. E-ner: Evidential deep learning for trustworthy named entity recognition. arXiv preprint arXiv:2305.17854, 2023.
  11. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  12. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  13. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  14. Comprehend medical: a named entity recognition and relationship extraction web service. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pages 1844–1851. IEEE, 2019.
  15. A study on machine learning approaches for named entity recognition. 2019 International Conference on Applied Machine Learning (ICAML), pages 153–159, 2019.
  16. Minanto at semeval-2023 task 2: Fine-tuning xlm-roberta for named entity recognition on english data. In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1127–1130, 2023.
  17. Improving domain-specific retrieval by nli fine-tuning. arXiv preprint arXiv:2308.03103, 2023.
  18. Improving language understanding with unsupervised learning. 2018.
  19. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360, 2016.
  20. Dissecting contextual word embeddings: Architecture and representation. arXiv preprint arXiv:1808.08949, 2018.
  21. A bert baseline for the natural questions. arXiv preprint arXiv:1901.08634, 2019.
  22. Pretrained transformers for text ranking: Bert and beyond. Springer Nature, 2022.
  23. Explaining bert model decisions for near-duplicate news article detection based on named entity recognition. In 2023 IEEE 17th International Conference on Semantic Computing (ICSC), pages 278–281. IEEE, 2023.
  24. Is information extraction solved by chatgpt? an analysis of performance, evaluation criteria, robustness and errors. arXiv preprint arXiv:2305.14450, 2023.
  25. Multi-layout invoice document dataset (midd): a dataset for named entity recognition. Data, 6(7):78, 2021.
  26. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  27. Promptner: Prompting for named entity recognition. arXiv preprint arXiv:2305.15444, 2023.
  28. Promptner: A prompting method for few-shot named entity recognition via k nearest neighbor search. arXiv preprint arXiv:2305.12217, 2023.
  29. Distantly supervised ner with partial annotation learning and reinforcement learning. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2159–2169, 2018.
  30. Few-shot classification in named entity recognition task. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pages 993–1000, 2019.
  31. Pixiu: A large language model, instruction data and evaluation benchmark for finance. arXiv preprint arXiv:2306.05443, 2023.
  32. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  33. Document ai: Benchmarks, models and applications. arXiv preprint arXiv:2111.08609, 2021.
  34. Comparative study of different optical character recognition models on handwritten and printed medical reports. In 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA), pages 581–586. IEEE, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Kalyani Pakhale (1 paper)
Citations (10)