Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-modal Cooking Workflow Construction for Food Recipes (2008.09151v1)

Published 20 Aug 2020 in cs.CL and cs.MM

Abstract: Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Liangming Pan (59 papers)
  2. Jingjing Chen (99 papers)
  3. Jianlong Wu (38 papers)
  4. Shaoteng Liu (17 papers)
  5. Chong-Wah Ngo (55 papers)
  6. Min-Yen Kan (92 papers)
  7. Yu-Gang Jiang (223 papers)
  8. Tat-Seng Chua (360 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com