Papers
Topics
Authors
Recent
2000 character limit reached

E3 Recipe: Entailment-Driven Extract and Edit

Updated 5 November 2025
  • E3 Recipe is an entailment-driven extract and edit framework that combines extraction, entailment, decision, and editing modules to analyze procedural text.
  • It employs a BERT-based architecture with token-level extraction and dual entailment scoring to generate user-facing queries and step-by-step instructions.
  • The framework demonstrates high explainability and improved performance, offering tangible accuracy and BLEU gains for regulated documents and culinary automation.

An e3 Recipe refers to the “Entailment-driven Extract and Edit” (E3) recipe—an operational pipeline for information extraction, logical entailment, and interactively generating user queries or procedural instructions from text. This methodology, proposed for conversational machine reading (CMR), was established in the E3 model by Zhong and Zettlemoyer (Zhong et al., 2019), which set new standards for explainable, modular extraction and interactive reading of procedural texts, including regulatory guidelines, eligibility rules, and stepwise protocols. While originally motivated by legal and administrative documents, similar principles have profound implications for advanced recipe understanding, question-driven culinary assistants, and procedural content generation.

1. Core Concepts: Entailment-driven Extract and Edit Framework

At its foundation, the e3 Recipe follows a modular architecture comprising four interlinked components—extraction, entailment, decision, and edit modules—operating over each conversational or interaction turn:

  1. Extraction Module: Identifies rule spans or latent procedural units within the source text, using contextual encoding.
  2. Entailment Module: Quantifies whether extracted procedural rules are already satisfied by the current context (user scenario or prior dialog) via token-level F1 overlap.
  3. Decision Module: Chooses the action—respond directly, indicate irrelevance, or inquire about unresolved rules.
  4. Edit Module (“Editor”): Converts relevant extracted spans into well-formed, user-facing questions or next-step instructions using an attentive sequence-to-sequence mechanism.

The system is inherently explainable: every inference is grounded in explicit textual evidence, with clear attribution of which rules have been addressed and which must be gathered interactively.

2. E3 Model Architecture and Theoretical Formulation

Elements of the E3 Recipe are instantiated with a BERT-based architecture. The input at each step is a concatenation of the dialog context, procedural rule text, user scenario, and previously posed/historic queries: x=[xQ;xD;xS;xH,1;;xH,N]x = [x_Q; x_D; x_S; x_{H,1}; \cdots; x_{H,N}] This sequence is encoded as: U=BERT(x)Rnx×dUU = \mathrm{BERT}(x) \in \mathbb{R}^{n_x \times d_U}

Extraction

Span start/end points are determined via: αi=σ(WαUi+bα),βi=σ(WβUi+bβ)\alpha_i = \sigma(W_\alpha U_i + b_\alpha), \qquad \beta_i = \sigma(W_\beta U_i + b_\beta) A rule span (si,ei)(s_i,e_i) is valid when αsi>τ\alpha_{s_i} > \tau and for the nearest eisie_i \geq s_i with βei>τ\beta_{e_i} > \tau (hyperparameter τ=0.5\tau = 0.5). Each span representation Ai\overline{A}_i is constructed via self-attention: γk=WγUk+bγ,γk=softmax(γ),Ai=k=sieiγkUk\overline{\gamma}_k = W_\gamma U_k + b_\gamma, \quad \gamma_k = \mathrm{softmax}(\overline{\gamma}), \quad \overline{A}_i = \sum_{k=s_i}^{e_i} \gamma_k U_k

Entailment

Two entailment scores are computed per extracted rule:

  • Scenario Entailment:

gi=2pr(Ri,S)re(Ri,S)pr(Ri,S)+re(Ri,S)g_i = \frac{2 \cdot \text{pr}(R_i, S) \cdot \text{re}(R_i, S)}{\text{pr}(R_i, S) + \text{re}(R_i, S)}

  • Dialog History Entailment:

hi=maxk=1,...,nHf1(Ri,Qk)h_i = \max_{k=1,...,n_H} \mathrm{f1}(R_i, Q_k)

The final span representation incorporates these: Ai=[Ai;gi;hi]A_i = [\overline{A}_i; g_i; h_i]

Decision

A summary vector CC (self-attention over UU) is used for main decisions: z=WzC+bzR4z = W_z C + b_z \in \mathbb{R}^4 Per-span inquiry scores: ri=WzAi+bzr_i = W_z A_i + b_z The decision loss is: $L_\mathrm{dec} = -\log \mathrm{softmax}(z)_k - \mathbbm{1}_{k=\text{inquire}} \log \mathrm{softmax}(r)_i$

Editing

Uses an attentive sequenced LSTM decoder (pretrained GloVe embeddings for tokens), to reconstruct pre-span and post-span edits. At each generation step tt: vt=embed(V,wt1),ht=LSTM([vt;at],ht1),ot=Wo[ht;at]+bo,p(wt)=softmax(Vot)v_t = \mathrm{embed}(V, w_{t-1}), \quad h_t = \mathrm{LSTM}([v_t; a_t], h_{t-1}), \quad o_t = W_o [h_t; a_t] + b_o, \quad p(w_t) = \mathrm{softmax}(Vo_t)

The editor forms a natural-language question around the extracted rule, e.g., transforming "UK civil service pensions" into "Are you receiving UK civil service pensions?".

3. Training Regimen and Practical Implementation Recipe

  • Tokenizer: revtok.
  • BERT Variant: “bert-base-uncased”, fine-tuned.
  • Editor Decoder: Pretrained GloVe vectors, two-way attentive LSTM.
  • Optimizer: Adam; learning rate 5×1055 \times 10^{-5}; 0.1 warm-up rate; dropout 0.4 (post-BERT).
  • Loss Functions: Extraction (noisy supervision from dialog trees), decision, and edit modules are trained with respective cross-entropy and BLEU-based objectives. Rule extraction loss weighted by λ=400\lambda = 400.
  • Instructions: Extraction and entailment modules are tightly coupled; the editor is trained separately.
  • Datasets: ShARC conversational machine reading dataset for rule-based conversational flow.
  • Span Supervision: Clauses/bullet points in source texts are used as noisy labels (aligned by edit distance between follow-up questions and document spans).

The code and all relevant preprocessing recipes are publicly available [https://github.com/vzhong/e3].

4. e3 Recipe’s Impact: Explainability, Generalization, and Results

Quantitative results on the ShARC dataset demonstrate the effectiveness of the E3 Recipe:

Model Micro Acc. Macro Acc. BLEU1 BLEU4 Combined
Seq2Seq 44.8 42.8 34.0 7.8 3.3
Pipeline 61.9 68.9 54.4 34.4 23.7
BERTQA 63.6 70.8 46.2 36.3 25.7
E3 67.6 73.3 54.1 38.7 28.4

E3 leads by 5.7% in micro-averaged accuracy and 4.3 BLEU4 over the strongest prior systems. Ablations verify that extraction, entailment, and editing each provide measurable gains. The explicit extraction of latent rules, entailment scoring, and editing module yield transparent, stepwise rationales; each stage’s outputs (extracted clauses, entailment status, inquiry scores) are underlined and displayed during execution (see Figs. 1, 4 in the original paper).

The explainability features uniquely support the transparency of the inference chain: users and system developers can directly inspect which procedural requirements the model identified, the status of each requirement (entailed or not), and which follow-up is generated. This traceability is pivotal for deployment in regulated domains or high-assurance procedural automation.

5. Technical Implications for Procedural and Recipe Modeling

While the E3 model was originally targeted at conversational legal reading, the pattern of “entailment-driven extraction and editing” is highly generalizable. In the context of food computing and recipe automation:

  • The e3 Recipe architecture provides a principled method to extract procedural dependencies (“latent rules”) from unstructured culinary text, reason about which conditions are satisfied given ingredient/state constraints, and iteratively generate interactive queries (e.g., to check for allergies, preparation equipment, or taste preferences).
  • Its modular decomposition allows plug-in replacement: alternative NER or entity-specific models can be swapped in for culinary adaptation, and the editor module can be tasked with constructing culinary questions, shopping assistants, or stepwise instructions.
  • Integration with multimodal or knowledge-augmented systems (such as ChefFusion (Li et al., 18 Sep 2024), KERL (Mohbat et al., 20 May 2025), or grammar-guided models (Bagler, 2022)) is facilitated by this modularity and explicitness.

6. Reproducibility, Limitations, and Future Directions

The E3 Recipe model and its source code are open-source, with detailed preprocessing scripts for noisy supervision and span extraction. The recipe’s main limitations arise from:

  • Strong reliance on token-level overlap for entailment, which may miss semantic paraphrase or complex reasoning.
  • Editor module’s fluency is governed by the quality of pretrained embeddings and the accuracy of extracted spans.
  • Decision logic is based predominantly on current extracted rules; global dialog or planning optimization is not modeled.

Future work may extend the E3 Recipe paradigm by incorporating richer semantic entailment (beyond token overlap), integrating external knowledge bases, or augmenting with user model personalization. The framework yields a highly interpretable and extensible basis for procedural language understanding, recipe automation, and interactive agent design.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to e3 Recipe.