Pseudo-Answer Rewriting (PAR)
- Pseudo-Answer Rewriting (PAR) is a method that converts contextual yes/no question–answer pairs into stand-alone factual statements by preserving essential subordinate clauses and binary propositions.
- It employs a T5-Base Transformer augmented with Soft-Mention Flags (SMF) to ensure precise control over syntactic and semantic content during the rewriting process.
- Empirical evaluation on datasets like Amazon-PQA shows that SMF-enhanced models outperform baselines, achieving higher BLEU, ROUGE, and BertScore metrics while maintaining strong clause coverage.
Pseudo-Answer Rewriting (PAR) is the task of reformulating contextualized yes/no (polar) question–answer pairs into stand-alone, decontextualized factual statements that encode the corresponding binary proposition, including any required embedded clauses such as conditions, complements, or alternatives. This enables the extraction of machine-actionable knowledge from common QA corpora (e.g., product forums, e-commerce community Q&A) where answers alone are not semantically self-sufficient. The principal challenge in PAR is controlling the syntactic and semantic form of the output, ensuring all meaningful content from the input (including auxiliary clauses and style constraints) is preserved within a single factual sentence.
1. Formal Task Definition
Given an input consisting of a yes/no question , an answer , and optional context (such as an item or product title)—all tokenized from the vocabulary —a single input instance is of token length . The target is to produce a factual, stand-alone statement summarizing the original yes/no proposition and required subordinate information, i.e.,
The model parameters are learned via maximum likelihood estimation over a dataset of reference rewritings,
The syntactic requirements for rewriting fall into four distinct categories: (1) Explanation (includes main clause repetition or reason), (2) Complement (main clause plus additional “also…” information), (3) Condition (“if…” sub-clause), and (4) Alternative (negation with alternative affirmative). These categories are operationalized during both training and evaluation.
2. Model Architecture
The underlying model is a T5-Base sequence-to-sequence Transformer (220M parameters), comprising 12-layer encoder and decoder stacks with multi-headed self-attention and feed-forward sublayers. The central architectural innovation is the addition of Soft-Mention Flags (SMF) at each decoder cross-attention block. SMF enables fine-grained control over which input phrases must be explicitly realized in the output at each step.
Standard cross-attention computes
where (decoder queries), (encoder keys), and (encoder values) are projected from hidden states. The SMF mechanism augments each encoder token with a “mention-flag” embedding, injected additively into the and projections. No additional layers are introduced, so memory and compute overhead is minimal beyond the base model footprint.
3. Soft-Constraint Mechanism: Soft-Mention Flags (SMF)
Constraint Extraction: Constituency parses (e.g., via benepar) automatically identify a set of phrase spans from the input , each representing a clause or argument required in the output.
Constraint State Tracking: A matrix is defined, where indicates the constraint status of the input token after generation of the output token:
- $0$ : is not part of any constraint.
- $1$ : is in some and not “covered” yet.
- $2$ : is in some and has been “covered”—i.e., its content is semantically realized in .
Coverage is computed using sentence-embedding similarity (e.g., Sentence-BERT): for phrase , if and , with thresholds , , then all tokens of are set to “covered” for subsequent generation steps.
Attention Modification: At decoding step , each encoder token 's projections are modified as
where are (learned) mention-flag embeddings. This signals to the decoder which input constraints remain active.
Style Constraints (SMF-Style): For second-person output style, all first-person pronoun tokens in the input are pre-marked covered; as the decoder emits second-person pronouns, corresponding flags are toggled to reflect fulfillment of the style transformation.
Training: The joint model (T5 + SMF embeddings) is fine-tuned using standard sequence cross-entropy loss. No dedicated loss term penalizing constraint violations is introduced; enforceability is achieved entirely via cross-attention gating.
4. Training Data and Protocols
The central dataset for PAR is derived from Amazon-PQA (Rozen et al. 2021), manually re-written to produce 1,500 question/answer/context → decontextualized statement pairs. These are uniformly distributed across the four clause categories and 11 product domains. The data split is 1,000 train / 100 dev / 400 test.
Preprocessing involves SentencePiece tokenization (32k vocabulary), with input lengths truncated or padded to 128 tokens. Hyper-parameters are not exhaustively specified, but fine-tuning uses Adam (β₁ = 0.9, β₂ = 0.999, ε = 1e–8), learning rate (with linear warmup for 10% of steps), batch size 16–32, 10 epochs with dev-set early stopping, weight decay 0.01, and dropout 0.1.
Additional zero-shot transfer evaluation is performed on Reddit PQA (“r/YayorNay”, ) and SemEval-2015 Task 3 Yes/No QA (), with no further fine-tuning.
5. Empirical Results and Evaluation
Automated Metrics (Table 2a) on the PAR test set include BLEU-4, ROUGE-L, and F1-BertScore. Soft-Mention Flags consistently outperform all baselines, with statistical significance ():
| Model | BLEU | ROUGE-L | BertScore |
|---|---|---|---|
| GPT-2 fine-tuned | 32.8 | 55.0 | 93.2 |
| COLD decoding | 28.1 | 51.3 | 92.0 |
| T5 baseline | 50.6 | 67.9 | 95.3 |
| CBS (hard constraints) | 45.2 | 62.7 | 94.2 |
| MF (lexical flags) | 51.2 | 68.5 | 95.4 |
| SMF (semantic flags) | 52.6 | 69.5 | 95.5 |
| SMF-Style | 52.5 | 68.9 | 95.4 |
Human Evaluation on 50 held-out samples measures clause coverage, correctness, style, relevance, grammar, coherence, and equivalence to gold rewritings. SMF yields 85% correct clause coverage (cf. T5: 79%). Statement-wise binary judgment rates follow:
| Model | Polarity | Context | Style | Relevance | Syntax | Coherence | Equivalence |
|---|---|---|---|---|---|---|---|
| T5 | 0.96 | 1.00 | 0.97 | 0.94 | 0.95 | 0.91 | 0.74 |
| MF | 0.94 | 0.98 | 0.99 | 0.94 | 0.98 | 0.91 | 0.69 |
| SMF | 0.95 | 0.98 | 0.93 | 0.91 | 0.94 | 0.92 | 0.77 |
| SMF-Style | 0.97 | 0.99 | 0.96 | 0.96 | 0.99 | 0.98 | 0.76 |
SMF-Style exhibits strong generalization on out-of-domain Reddit and SemEval sets, with near-perfect scores for polarity, context, relevance, syntax, and coherence, showing only minor reductions to 0.80–0.85 in semantic equivalence.
6. Representative Examples and Error Modes
Examined examples illustrate correct constraint realization for all clause categories:
- Explanation: “Did Sandy want coffee?” / “Yes.” —> “Yes, Sandy wants coffee.”
- Complement: “Can you install Snapchat on the Samsung Galaxy A20 phone?” / “No, you can’t. Also, you can get Twitter on it.” —> “No, you can’t install Snapchat on the Samsung Galaxy A20 phone, but you can get Twitter on it.”
- Condition: “Is the Dell XPS shipped internationally?” / “Yes, it can be shipped if you provide a shipping label.” —> “Yes, the Dell XPS can be shipped internationally if you provide a shipping label.”
- Alternative (failure): “Does the monitor have built-in speakers?” / “No, but it has a headphone jack.” —> “No, the Dell 27-inch Full HD monitor does not have built-in speakers.” (missing: “but it has a headphone jack.”). Such omissions occur in 7–10% of “alternative” category cases—tightening semantic thresholds or expanding the search window for uncovered constraints can mitigate this.
Rare errors involving style (e.g., failure to switch to second person) are largely resolved in SMF-Style (5%).
7. Implementation and Replication Considerations
Constraint extraction is performed with standard constituency parsers (e.g., benepar) and a deterministic selection algorithm. SMF requires minor, targeted adjustments to T5 cross-attention—for each encoder token, add the mention-flag embedding before attention computation. Real-time tracking of constraint fulfillment relies on Sentence-BERT for phrase and output-phrase similarity and sliding window computation.
Hyper-parameters follow conventional fine-tuning defaults; performance tuning may target dev-set perplexity and constraint coverage. No hyper-parameters, architectural elements, or loss terms unspecified in the reported work are required for replication.
A plausible implication is that semantic soft constraints (SMF) offer a computationally efficient and empirically robust means of controllable content selection in neural text rewriting tasks where clause- or argument-level content preservation is mandatory.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free