Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark (2204.07775v1)

Published 16 Apr 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste. The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ania Wróblewska (1 paper)
  2. Agnieszka Kaliska (4 papers)
  3. Maciej Pawłowski (2 papers)
  4. Dawid Wiśniewski (5 papers)
  5. Witold Sosnowski (5 papers)
  6. Agnieszka Ławrynowicz (13 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.