Multi-modal Cooking Workflow Construction for Food Recipes (2008.09151v1)

Published 20 Aug 2020 in cs.CL and cs.MM

Abstract: Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.

Authors (8)

Liangming Pan (59 papers)
Jingjing Chen (99 papers)
Jianlong Wu (38 papers)
Shaoteng Liu (17 papers)
Chong-Wah Ngo (55 papers)
Min-Yen Kan (92 papers)
Yu-Gang Jiang (223 papers)
Tat-Seng Chua (360 papers)

Citations (28)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

YouTube

Show All Videos

Multi-modal Cooking Workflow Construction for Food Recipes (2008.09151v1)

Summary

Related Papers

YouTube