E3 Recipe: Entailment-Driven Extract and Edit

Updated 5 November 2025

E3 Recipe is an entailment-driven extract and edit framework that combines extraction, entailment, decision, and editing modules to analyze procedural text.
It employs a BERT-based architecture with token-level extraction and dual entailment scoring to generate user-facing queries and step-by-step instructions.
The framework demonstrates high explainability and improved performance, offering tangible accuracy and BLEU gains for regulated documents and culinary automation.

An e3 Recipe refers to the “Entailment-driven Extract and Edit” (E3) recipe—an operational pipeline for information extraction, logical entailment, and interactively generating user queries or procedural instructions from text. This methodology, proposed for conversational machine reading (CMR), was established in the E3 model by Zhong and Zettlemoyer (Zhong et al., 2019), which set new standards for explainable, modular extraction and interactive reading of procedural texts, including regulatory guidelines, eligibility rules, and stepwise protocols. While originally motivated by legal and administrative documents, similar principles have profound implications for advanced recipe understanding, question-driven culinary assistants, and procedural content generation.

1. Core Concepts: Entailment-driven Extract and Edit Framework

At its foundation, the e3 Recipe follows a modular architecture comprising four interlinked components—extraction, entailment, decision, and edit modules—operating over each conversational or interaction turn:

Extraction Module: Identifies rule spans or latent procedural units within the source text, using contextual encoding.
Entailment Module: Quantifies whether extracted procedural rules are already satisfied by the current context (user scenario or prior dialog) via token-level F1 overlap.
Decision Module: Chooses the action—respond directly, indicate irrelevance, or inquire about unresolved rules.
Edit Module (“Editor”): Converts relevant extracted spans into well-formed, user-facing questions or next-step instructions using an attentive sequence-to-sequence mechanism.

The system is inherently explainable: every inference is grounded in explicit textual evidence, with clear attribution of which rules have been addressed and which must be gathered interactively.

2. E3 Model Architecture and Theoretical Formulation

Elements of the E3 Recipe are instantiated with a BERT-based architecture. The input at each step is a concatenation of the dialog context, procedural rule text, user scenario, and previously posed/historic queries: $x = [x_Q; x_D; x_S; x_{H,1}; \cdots; x_{H,N}]$ This sequence is encoded as: $U = \mathrm{BERT}(x) \in \mathbb{R}^{n_x \times d_U}$

Extraction

Span start/end points are determined via: $\alpha_i = \sigma(W_\alpha U_i + b_\alpha), \qquad \beta_i = \sigma(W_\beta U_i + b_\beta)$ A rule span $(s_i,e_i)$ is valid when $\alpha_{s_i} > \tau$ and for the nearest $e_i \geq s_i$ with $\beta_{e_i} > \tau$ (hyperparameter $\tau = 0.5$ ). Each span representation $\overline{A}_i$ is constructed via self-attention: $\overline{\gamma}_k = W_\gamma U_k + b_\gamma, \quad \gamma_k = \mathrm{softmax}(\overline{\gamma}), \quad \overline{A}_i = \sum_{k=s_i}^{e_i} \gamma_k U_k$

Entailment

Two entailment scores are computed per extracted rule:

Scenario Entailment:

$g_i = \frac{2 \cdot \text{pr}(R_i, S) \cdot \text{re}(R_i, S)}{\text{pr}(R_i, S) + \text{re}(R_i, S)}$

Dialog History Entailment:

$h_i = \max_{k=1,...,n_H} \mathrm{f1}(R_i, Q_k)$

The final span representation incorporates these: $A_i = [\overline{A}_i; g_i; h_i]$

Decision

A summary vector $C$ (self-attention over $U$ ) is used for main decisions: $z = W_z C + b_z \in \mathbb{R}^4$ Per-span inquiry scores: $r_i = W_z A_i + b_z$ The decision loss is: $L_\mathrm{dec} = -\log \mathrm{softmax}(z)_k - \mathbbm{1}_{k=\text{inquire}} \log \mathrm{softmax}(r)_i$

Editing

Uses an attentive sequenced LSTM decoder (pretrained GloVe embeddings for tokens), to reconstruct pre-span and post-span edits. At each generation step $t$ : $v_t = \mathrm{embed}(V, w_{t-1}), \quad h_t = \mathrm{LSTM}([v_t; a_t], h_{t-1}), \quad o_t = W_o [h_t; a_t] + b_o, \quad p(w_t) = \mathrm{softmax}(Vo_t)$

The editor forms a natural-language question around the extracted rule, e.g., transforming "UK civil service pensions" into "Are you receiving UK civil service pensions?".

3. Training Regimen and Practical Implementation Recipe

Tokenizer: revtok.
BERT Variant: “bert-base-uncased”, fine-tuned.
Editor Decoder: Pretrained GloVe vectors, two-way attentive LSTM.
Optimizer: Adam; learning rate $5 \times 10^{-5}$ ; 0.1 warm-up rate; dropout 0.4 (post-BERT).
Loss Functions: Extraction (noisy supervision from dialog trees), decision, and edit modules are trained with respective cross-entropy and BLEU-based objectives. Rule extraction loss weighted by $\lambda = 400$ .
Instructions: Extraction and entailment modules are tightly coupled; the editor is trained separately.
Datasets: ShARC conversational machine reading dataset for rule-based conversational flow.
Span Supervision: Clauses/bullet points in source texts are used as noisy labels (aligned by edit distance between follow-up questions and document spans).

The code and all relevant preprocessing recipes are publicly available [https://github.com/vzhong/e3].

4. e3 Recipe’s Impact: Explainability, Generalization, and Results

Quantitative results on the ShARC dataset demonstrate the effectiveness of the E3 Recipe:

Model	Micro Acc.	Macro Acc.	BLEU1	BLEU4	Combined
Seq2Seq	44.8	42.8	34.0	7.8	3.3
Pipeline	61.9	68.9	54.4	34.4	23.7
BERTQA	63.6	70.8	46.2	36.3	25.7
E3	67.6	73.3	54.1	38.7	28.4

E3 leads by 5.7% in micro-averaged accuracy and 4.3 BLEU4 over the strongest prior systems. Ablations verify that extraction, entailment, and editing each provide measurable gains. The explicit extraction of latent rules, entailment scoring, and editing module yield transparent, stepwise rationales; each stage’s outputs (extracted clauses, entailment status, inquiry scores) are underlined and displayed during execution (see Figs. 1, 4 in the original paper).

The explainability features uniquely support the transparency of the inference chain: users and system developers can directly inspect which procedural requirements the model identified, the status of each requirement (entailed or not), and which follow-up is generated. This traceability is pivotal for deployment in regulated domains or high-assurance procedural automation.

5. Technical Implications for Procedural and Recipe Modeling

While the E3 model was originally targeted at conversational legal reading, the pattern of “entailment-driven extraction and editing” is highly generalizable. In the context of food computing and recipe automation:

The e3 Recipe architecture provides a principled method to extract procedural dependencies (“latent rules”) from unstructured culinary text, reason about which conditions are satisfied given ingredient/state constraints, and iteratively generate interactive queries (e.g., to check for allergies, preparation equipment, or taste preferences).
Its modular decomposition allows plug-in replacement: alternative NER or entity-specific models can be swapped in for culinary adaptation, and the editor module can be tasked with constructing culinary questions, shopping assistants, or stepwise instructions.
Integration with multimodal or knowledge-augmented systems (such as ChefFusion (Li et al., 18 Sep 2024), KERL (Mohbat et al., 20 May 2025), or grammar-guided models (Bagler, 2022)) is facilitated by this modularity and explicitness.

6. Reproducibility, Limitations, and Future Directions

The E3 Recipe model and its source code are open-source, with detailed preprocessing scripts for noisy supervision and span extraction. The recipe’s main limitations arise from:

Strong reliance on token-level overlap for entailment, which may miss semantic paraphrase or complex reasoning.
Editor module’s fluency is governed by the quality of pretrained embeddings and the accuracy of extracted spans.
Decision logic is based predominantly on current extracted rules; global dialog or planning optimization is not modeled.

Future work may extend the E3 Recipe paradigm by incorporating richer semantic entailment (beyond token overlap), integrating external knowledge bases, or augmenting with user model personalization. The framework yields a highly interpretable and extensible basis for procedural language understanding, recipe automation, and interactive agent design.