Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Embedded Named Entity Recognition using Probing Classifiers (2403.11747v2)

Published 18 Mar 2024 in cs.CL

Abstract: Streaming text generation has become a common way of increasing the responsiveness of LLM powered applications, such as chat assistants. At the same time, extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the LLM. Instead, we propose an approach called EMBER which enables streaming named entity recognition in decoder-only LLMs without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline. We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models optimized for streaming text generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks.
  2. Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). CoRR, abs/1803.08375.
  3. Dhananjay Ashok and Zachary C. Lipton. 2023. Promptner: Prompting for named entity recognition.
  4. Yonatan Belinkov. 2022. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219.
  5. Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72.
  6. Pythia: A suite for analyzing large language models across training and scaling.
  7. Language Models are Few-Shot Learners.
  8. Low-complexity probing via finding subnetworks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 960–966, Online. Association for Computational Linguistics.
  9. Learning in-context learning for named entity recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13661–13675, Toronto, Canada. Association for Computational Linguistics.
  10. PaLM: Scaling Language Modeling with Pathways.
  11. What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, Florence, Italy. Association for Computational Linguistics.
  12. Chain-of-Verification Reduces Hallucination in Large Language Models.
  13. Probing for semantic evidence of composition by means of simple classification tasks. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 134–139, Berlin, Germany. Association for Computational Linguistics.
  14. SpanNER: Named entity re-/recognition as span prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7183–7195, Online. Association for Computational Linguistics.
  15. Retrieval-Augmented Generation for Large Language Models: A Survey.
  16. Dissecting Recall of Factual Associations in Auto-Regressive Language Models.
  17. Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models.
  18. An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks.
  19. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.
  20. Retrieval-augmented code generation for universal information extraction.
  21. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR.
  22. Linearity of Relation Decoding in Transformer Language Models.
  23. OntoNotes: The 90% solution. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 57–60, New York City, USA. Association for Computational Linguistics.
  24. Do Attention Heads in BERT Track Syntactic Dependencies?
  25. GenIE: Generative information extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4626–4643, Seattle, United States. Association for Computational Linguistics.
  26. CTRL: A Conditional Transformer Language Model for Controllable Generation.
  27. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
  28. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1):50–70.
  29. CodeIE: Large code generation models are better few-shot information extractors. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15339–15353, Toronto, Canada. Association for Computational Linguistics.
  30. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations 2019, page 18.
  31. Unified structure generation for universal information extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5755–5772, Dublin, Ireland. Association for Computational Linguistics.
  32. Hierarchical contextualized representation for named entity recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8441–8448.
  33. David Mareček and Rudolf Rosa. 2019. From balustrades to pierre vinken: Looking for syntax in transformer self-attentions. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 263–275, Florence, Italy. Association for Computational Linguistics.
  34. Locating and Editing Factual Associations in GPT. In Advances in Neural Information Processing Systems.
  35. Hiroki Nakayama. 2018. seqeval: A python framework for sequence labeling evaluation. Software available from https://github.com/chakki-works/seqeval.
  36. Information-theoretic probing for linguistic structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4609–4622, Online. Association for Computational Linguistics.
  37. Language models are unsupervised multitask learners.
  38. Alessandro Raganato and Jörg Tiedemann. 2018. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 287–297, Brussels, Belgium. Association for Computational Linguistics.
  39. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316–1331.
  40. Unsupervised cross-lingual representation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 31–38, Florence, Italy. Association for Computational Linguistics.
  41. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
  42. Toolformer: Language Models Can Teach Themselves to Use Tools.
  43. Probing the representations of named entities in transformer-based language models. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 384–393, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  44. REPLUG: Retrieval-Augmented Black-Box Language Models.
  45. Does string-based neural MT learn source syntax? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1526–1534, Austin, Texas. Association for Computational Linguistics.
  46. A sequence-to-set network for nested named entity recognition. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 3936–3942. International Joint Conferences on Artificial Intelligence Organization. Main Track.
  47. What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
  48. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147.
  49. LLaMA: Open and Efficient Foundation Language Models.
  50. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  51. Label words are anchors: An information flow perspective for understanding in-context learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9840–9855, Singapore. Association for Computational Linguistics.
  52. Gpt-ner: Named entity recognition via large language models. arXiv preprint arXiv:2304.10428.
  53. Automated concatenation of embeddings for structured prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2643–2660, Online. Association for Computational Linguistics.
  54. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  55. A unified generative framework for various NER subtasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5808–5822, Online. Association for Computational Linguistics.
  56. Packed levitated marker for entity and relation extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4904–4917, Dublin, Ireland. Association for Computational Linguistics.
  57. Jiawei Zhang. 2023. Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. ArXiv, abs/2304.11116.
  58. OPT: Open Pre-trained Transformer Language Models.
Citations (1)

Summary

We haven't generated a summary for this paper yet.