Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distilling Named Entity Recognition Models for Endangered Species from Large Language Models (2403.15430v1)

Published 13 Mar 2024 in cs.CL

Abstract: Natural language processing (NLP) practitioners are leveraging LLMs (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge. At the same time, ecological experts are searching for a variety of means to preserve biodiversity. To contribute to these efforts, we focused on endangered species and through in-context learning, we distilled knowledge from GPT-4. In effect, we created datasets for both named entity recognition (NER) and relation extraction (RE) via a two-stage process: 1) we generated synthetic data from GPT-4 of four classes of endangered species, 2) humans verified the factual accuracy of the synthetic data, resulting in gold data. Eventually, our novel dataset contains a total of 3.6K sentences, evenly divided between 1.8K NER and 1.8K RE sentences. The constructed dataset was then used to fine-tune both general BERT and domain-specific BERT variants, completing the knowledge distillation process from GPT-4 to BERT, because GPT-4 is resource intensive. Experiments show that our knowledge transfer approach is effective at creating a NER model suitable for detecting endangered species from texts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Large language models are few-shot clinical information extractors.
  2. Matching the blanks: Distributional similarity for relation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2895–2905, Florence, Italy. Association for Computational Linguistics.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. Applications for deep learning in ecology.
  6. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  7. Research topics and trends of endangered species using text mining in korea.
  8. Structured information extraction from complex scientific text with fine-tuned large language models.
  9. Domain-specific language model pretraining for biomedical natural language processing.
  10. Thinking about gpt-3 in-context learning for biomedical ie? think again. arXiv preprint arXiv:2203.08410.
  11. Distilling the knowledge in a neural network.
  12. Co-training improves prompt-based learning for large language models.
  13. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  14. Improving multi-task deep neural networks via knowledge distillation for natural language understanding.
  15. OpenAI. 2023. Gpt-4 technical report.
  16. Snorkel: Rapid training data creation with weak supervision. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, volume 11, page 269. NIH Public Access.
  17. Language models in the loop: Incorporating prompting into weak supervision.
  18. Matthew C Swain and Jacqueline M Cole. 2016. Chemdataextractor: a toolkit for automated extraction of chemical information from the scientific literature. Journal of chemical information and modeling, 56(10):1894–1904.
  19. Want to reduce labeling cost? gpt-3 can help. arXiv preprint arXiv:2108.13487.
  20. Universalner: Targeted distillation from large language models for open named entity recognition.
Citations (1)

Summary

We haven't generated a summary for this paper yet.