Deep Learning Based Named Entity Recognition Models for Recipes (2402.17447v1)
Abstract: Food touches our lives through various endeavors, including flavor, nourishment, health, and sustainability. Recipes are cultural capsules transmitted across generations via unstructured text. Automated protocols for recognizing named entities, the building blocks of recipe text, are of immense value for various applications ranging from information extraction to novel recipe generation. Named entity recognition is a technique for extracting information from unstructured or semi-structured data with known labels. Starting with manually-annotated data of 6,611 ingredient phrases, we created an augmented dataset of 26,445 phrases cumulatively. Simultaneously, we systematically cleaned and analyzed ingredient phrases from RecipeDB, the gold-standard recipe data repository, and annotated them using the Stanford NER. Based on the analysis, we sampled a subset of 88,526 phrases using a clustering-based approach while preserving the diversity to create the machine-annotated dataset. A thorough investigation of NER approaches on these three datasets involving statistical, fine-tuning of deep learning-based LLMs and few-shot prompting on LLMs provides deep insights. We conclude that few-shot prompting on LLMs has abysmal performance, whereas the fine-tuned spaCy-transformer emerges as the best model with macro-F1 scores of 95.9%, 96.04%, and 95.71% for the manually-annotated, augmented, and machine-annotated datasets, respectively.
- FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. In Association for Computational Linguistics, pages 54–59, Minneapolis, Minnesota. Association for Computational Linguistics.
- Bogdan Babych and Anthony Hartley. 2003. Improving machine translation quality with automatic named entity recognition. In 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL, pages 1–8.
- RecipeDB: A resource for exploring recipes. Database, page 77.
- Identification of Food Quality Descriptors in Customer Chat Conversations using Named Entity Recognition. In ACM International Conference Proceeding Series, CODS-COMAD ’21, pages 257–261, New York, NY, USA. Association for Computing Machinery.
- J. Gorlinsky C. Aone, M. E. Okurowski and B. Larsen. 1999. A trainable summarizer with knowledge acquired from robust nlp techniques. Advances in Automatic Text Summarization, MIT Press:71–80+.
- SciFoodNER: Food Named Entity Recognition for Scientific Text. In Proceedings - IEEE International Conference on Big Data, pages 4065–4073. IEEE.
- Butter: Bidirectional lstm for food named-entity recognition. In IEEE International Conference on Big Data (Big Data), pages 3550–3556. IEEE.
- Pengxiang Cheng and Katrin Erk. 2020. Attending to Entities for Better Text Understanding. In 34th AAAI Conference on Artificial Intelligence, volume 34, pages 7554–7561.
- Hai Leong Chieu and Hwee Tou Ng. 2002. Named entity recognition. Stanford Lecture CS229, pages 1–7.
- Improved Named Entity Recognition for Noisy Call Center Transcripts. In W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference, pages 361–370.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT - Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186.
- A named entity based approach to model recipes. Proceedings - 36th International Conference on Data Engineering Workshops, ICDEW, pages 88–93.
- A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLoS ONE, 12(6):e0179488.
- Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165(1):91–134.
- Chinese Named Entity Recognition Model Based on Multi-Task Learning. Applied Sciences, 13(8):4770.
- Incorporating non-local information into information extraction systems by Gibbs sampling. In ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 363–370, Ann Arbor, Michigan. Association for Computational Linguistics.
- Mansi Goel and Ganesh Bagler. 2022. Computational gastronomy: A data science approach to food. Journal of Biosciences, 47(1):1–10.
- Named entity recognition in query. In Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, pages 267–274.
- CpG Island Detection Using Transformer Model with Conditional Random Field. In IBSSC 2022 - IEEE Bombay Section Signature Conference, pages 1–5.
- Bin Ji. 2023. VicunaNER: Zero/Few-shot Named Entity Recognition using Vicuna. arXiv.
- Nutritional profile estimation in cooking recipes. In 36th International Conference on Data Engineering Workshops, ICDEW, pages 82–87. IEEE.
- ManuBERT: A pretrained Manufacturing science language representation model. SSRN.
- Murari Kumar. 2023. An Algorithm for Automatic Text Annotation for Named Entity Recognition using spaCy Framework. Research Square, pages 1–18.
- Conditional Random Fields : Probabilistic Models for Segmenting and Labeling Sequence Data Abstract. Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289.
- Wenzhong Liu and Xiaohui Cui. 2023. Improving Named Entity Recognition for Social Media with Data Augmentation. Applied Sciences, 13(9):5360.
- Joint named entity recognition and disambiguation. In Conference on Empirical Methods in Natural Language Processing, pages 879–888.
- Bruno Mathis. 2022. Extracting Proceedings Data from Court Cases with Machine Learning. Stats, 5(4):1305–1320.
- spaCy Industrial-strength Natural Language Processing in Python.
- Named Entity Recognition for Question Answering. In Proceedings ALTW, pages 51–58.
- Lexicon infused phrase embeddings for named entity resolution. CoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings, pages 78–86.
- AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition.
- Named Entity Recognition using Conditional Random Fields. Procedia Computer Science, 167:1181–1188.
- Exploiting food embeddings for ingredient substitution. In HEALTHINF 2021 - 14th International Conference on Health Informatics; Part of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2021, pages 67–77.
- Desislava Petkova and W. Bruce Croft. 2007. Proximity-based document representation for named entity retrieval. In International Conference on Information and Knowledge Management, Proceedings, pages 731–740.
- Foodie: A rule-based named-entity recognition method for food information extraction. ICPRAM 2019 - Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, 12:915–922.
- Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-Grained Student Ensemble. Proceedings of the 37th AAAI Conference on Artificial Intelligence, 37:13501–13509.
- L. R. Rabiner and B. H. Juang. 1986. An Introduction to Hidden Markov Models. IEEE ASSP Magazine, 3(1):4–16.
- Extracting Settings from Multilingual Recipes with Various Sequence Tagging Models: an Experimental Study. In Proceedings - 18th International Conference on Intelligent Computer Communication and Processing Conference, ICCP 2022, pages 65–72.
- Natural Language Processing Applied to Forensics Information Extraction With Transformers and Graph Visualization. IEEE Transactions on Computational Social Systems.
- DistilBERT.
- Segment-level neural conditional random fields for named entity recognition. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, pages 97–102.
- Named Entity Recognition for Drone Forensic Using BERT and DistilBERT. In 2022 International Conference on Data Science and Its Applications, ICoDSA 2022, pages 53–58.
- Floods Relevancy and Identification of Location from Twitter Posts using NLP Techniques.
- Muzamil Hussain Syed and Sun Tae Chung. 2021. Menuner: Domain-adapted bert based ner approach for a domain with limited dataset and its application to food menu domain. Applied Sciences, 11(13):6007.
- GPT-NER: Named Entity Recognition via Large Language Models. arXiv.
- Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database : the journal of biological databases and curation, 2016:baw140.
- Zero-Shot Information Extraction via Chatting with ChatGPT. arXiv.
- Xiaoran Yang and Wenkang Huang. 2018. A conditional random fields approach to clinical name entity recognition. In CEUR Workshop Proceedings, volume 2242, pages 1–6.
- ErniE: Enhanced language representation with informative entities. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 1441–1451.