Pseudo-Answer Rewriting (PAR)

Updated 9 November 2025

Pseudo-Answer Rewriting (PAR) is a method that converts contextual yes/no question–answer pairs into stand-alone factual statements by preserving essential subordinate clauses and binary propositions.
It employs a T5-Base Transformer augmented with Soft-Mention Flags (SMF) to ensure precise control over syntactic and semantic content during the rewriting process.
Empirical evaluation on datasets like Amazon-PQA shows that SMF-enhanced models outperform baselines, achieving higher BLEU, ROUGE, and BertScore metrics while maintaining strong clause coverage.

Pseudo-Answer Rewriting (PAR) is the task of reformulating contextualized yes/no (polar) question–answer pairs into stand-alone, decontextualized factual statements that encode the corresponding binary proposition, including any required embedded clauses such as conditions, complements, or alternatives. This enables the extraction of machine-actionable knowledge from common QA corpora (e.g., product forums, e-commerce community Q&A) where answers alone are not semantically self-sufficient. The principal challenge in PAR is controlling the syntactic and semantic form of the output, ensuring all meaningful content from the input (including auxiliary clauses and style constraints) is preserved within a single factual sentence.

1. Formal Task Definition

Given an input consisting of a yes/no question $q = [q_1, \ldots, q_n]$ , an answer $a = [a_1, \ldots, a_m]$ , and optional context $c = [c_1, \ldots, c_p]$ (such as an item or product title)—all tokenized from the vocabulary $V$ —a single input instance is $x = [q; \langle\mathrm{SEP}\rangle ; a; \langle\mathrm{SEP}\rangle ; c]$ of token length $K = n+m+p+2$ . The target is to produce a factual, stand-alone statement $y = [y_1, \ldots, y_\ell]$ summarizing the original yes/no proposition and required subordinate information, i.e.,

$\hat{y} = F_\theta(x), \quad F_\theta : X \rightarrow Y$

The model parameters $\theta$ are learned via maximum likelihood estimation over a dataset $D$ of reference rewritings,

$\theta^* = \arg\max_\theta \sum_{(x,y)\in D} \log P_\theta(y|x)$

The syntactic requirements for rewriting fall into four distinct categories: (1) Explanation (includes main clause repetition or reason), (2) Complement (main clause plus additional “also…” information), (3) Condition (“if…” sub-clause), and (4) Alternative (negation with alternative affirmative). These categories are operationalized during both training and evaluation.

2. Model Architecture

The underlying model is a T5-Base sequence-to-sequence Transformer (220M parameters), comprising 12-layer encoder and decoder stacks with multi-headed self-attention and feed-forward sublayers. The central architectural innovation is the addition of Soft-Mention Flags (SMF) at each decoder cross-attention block. SMF enables fine-grained control over which input phrases must be explicitly realized in the output at each step.

Standard cross-attention computes

$\operatorname{Attn}(Q, K, V) = \operatorname{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$

where $Q$ (decoder queries), $K$ (encoder keys), and $V$ (encoder values) are projected from hidden states. The SMF mechanism augments each encoder token with a “mention-flag” embedding, injected additively into the $K$ and $V$ projections. No additional layers are introduced, so memory and compute overhead is minimal beyond the base model footprint.

3. Soft-Constraint Mechanism: Soft-Mention Flags (SMF)

Constraint Extraction: Constituency parses (e.g., via benepar) automatically identify a set $C = \{c_i\}$ of phrase spans from the input $x$ , each representing a clause or argument required in the output.

Constraint State Tracking: A matrix $M \in \{0,1,2\}^{K \times \ell}$ is defined, where $M_{i,t}$ indicates the constraint status of the $i^{th}$ input token after generation of the $t^{th}$ output token:

$0$ : $x_i$ is not part of any constraint.
$1$ : $x_i$ is in some $c_i \in C$ and not “covered” yet.
$2$ : $x_i$ is in some $c_i \in C$ and has been “covered”—i.e., its content is semantically realized in $y_{1:t}$ .

Coverage is computed using sentence-embedding similarity (e.g., Sentence-BERT): for phrase $c_i$ , if $\operatorname{sim}_t > a$ and $\operatorname{sim}_t - \operatorname{sim}_{t-1} > b$ , with thresholds $a \approx 0.8$ , $b \approx 0.05$ , then all tokens of $c_i$ are set to “covered” for subsequent generation steps.

Attention Modification: At decoding step $t$ , each encoder token $i$ 's $K'', V''$ projections are modified as

$K'_{i} = W_k h^e_i + E_k(M_{i,t}), \qquad V'_i = W_v h^e_i + E_v(M_{i,t})$

where $E_k, E_v$ are (learned) $3 \times d$ mention-flag embeddings. This signals to the decoder which input constraints remain active.

Style Constraints (SMF-Style): For second-person output style, all first-person pronoun tokens in the input are pre-marked covered; as the decoder emits second-person pronouns, corresponding flags are toggled to reflect fulfillment of the style transformation.

Training: The joint model (T5 + SMF embeddings) is fine-tuned using standard sequence cross-entropy loss. No dedicated loss term penalizing constraint violations is introduced; enforceability is achieved entirely via cross-attention gating.

4. Training Data and Protocols

The central dataset for PAR is derived from Amazon-PQA (Rozen et al. 2021), manually re-written to produce 1,500 question/answer/context → decontextualized statement pairs. These are uniformly distributed across the four clause categories and 11 product domains. The data split is 1,000 train / 100 dev / 400 test.

Preprocessing involves SentencePiece tokenization (32k vocabulary), with input lengths truncated or padded to 128 tokens. Hyper-parameters are not exhaustively specified, but fine-tuning uses Adam (β₁ = 0.9, β₂ = 0.999, ε = 1e–8), learning rate $3\times10^{-5}$ (with linear warmup for 10% of steps), batch size 16–32, 10 epochs with dev-set early stopping, weight decay 0.01, and dropout 0.1.

Additional zero-shot transfer evaluation is performed on Reddit PQA (“r/YayorNay”, $n=50$ ) and SemEval-2015 Task 3 Yes/No QA ( $n=50$ ), with no further fine-tuning.

5. Empirical Results and Evaluation

Automated Metrics (Table 2a) on the PAR test set include BLEU-4, ROUGE-L, and F1-BertScore. Soft-Mention Flags consistently outperform all baselines, with statistical significance ( $p<0.01$ ):

Model	BLEU	ROUGE-L	BertScore
GPT-2 fine-tuned	32.8	55.0	93.2
COLD decoding	28.1	51.3	92.0
T5 baseline	50.6	67.9	95.3
CBS (hard constraints)	45.2	62.7	94.2
MF (lexical flags)	51.2	68.5	95.4
SMF (semantic flags)	52.6	69.5	95.5
SMF-Style	52.5	68.9	95.4

Human Evaluation on 50 held-out samples measures clause coverage, correctness, style, relevance, grammar, coherence, and equivalence to gold rewritings. SMF yields 85% correct clause coverage (cf. T5: 79%). Statement-wise binary judgment rates follow:

Model	Polarity	Context	Style	Relevance	Syntax	Coherence	Equivalence
T5	0.96	1.00	0.97	0.94	0.95	0.91	0.74
MF	0.94	0.98	0.99	0.94	0.98	0.91	0.69
SMF	0.95	0.98	0.93	0.91	0.94	0.92	0.77
SMF-Style	0.97	0.99	0.96	0.96	0.99	0.98	0.76

SMF-Style exhibits strong generalization on out-of-domain Reddit and SemEval sets, with near-perfect scores for polarity, context, relevance, syntax, and coherence, showing only minor reductions to 0.80–0.85 in semantic equivalence.

6. Representative Examples and Error Modes

Examined examples illustrate correct constraint realization for all clause categories:

Explanation: “Did Sandy want coffee?” / “Yes.” —> “Yes, Sandy wants coffee.”
Complement: “Can you install Snapchat on the Samsung Galaxy A20 phone?” / “No, you can’t. Also, you can get Twitter on it.” —> “No, you can’t install Snapchat on the Samsung Galaxy A20 phone, but you can get Twitter on it.”
Condition: “Is the Dell XPS shipped internationally?” / “Yes, it can be shipped if you provide a shipping label.” —> “Yes, the Dell XPS can be shipped internationally if you provide a shipping label.”
Alternative (failure): “Does the monitor have built-in speakers?” / “No, but it has a headphone jack.” —> “No, the Dell 27-inch Full HD monitor does not have built-in speakers.” (missing: “but it has a headphone jack.”). Such omissions occur in 7–10% of “alternative” category cases—tightening semantic thresholds or expanding the search window for uncovered constraints can mitigate this.

Rare errors involving style (e.g., failure to switch to second person) are largely resolved in SMF-Style ( $<$ 5%).

7. Implementation and Replication Considerations

Constraint extraction is performed with standard constituency parsers (e.g., benepar) and a deterministic selection algorithm. SMF requires minor, targeted adjustments to T5 cross-attention—for each encoder token, add the mention-flag embedding before attention computation. Real-time tracking of constraint fulfillment relies on Sentence-BERT for phrase and output-phrase similarity and sliding window computation.

Hyper-parameters follow conventional fine-tuning defaults; performance tuning may target dev-set perplexity and constraint coverage. No hyper-parameters, architectural elements, or loss terms unspecified in the reported work are required for replication.

A plausible implication is that semantic soft constraints (SMF) offer a computationally efficient and empirically robust means of controllable content selection in neural text rewriting tasks where clause- or argument-level content preservation is mandatory.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Pseudo-Answer Rewriting (PAR).

Pseudo-Answer Rewriting (PAR)

1. Formal Task Definition

2. Model Architecture

3. Soft-Constraint Mechanism: Soft-Mention Flags (SMF)

4. Training Data and Protocols

5. Empirical Results and Evaluation

6. Representative Examples and Error Modes

7. Implementation and Replication Considerations

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Pseudo-Answer Rewriting (PAR)

1. Formal Task Definition

2. Model Architecture

3. Soft-Constraint Mechanism: Soft-Mention Flags (SMF)

4. Training Data and Protocols

5. Empirical Results and Evaluation

6. Representative Examples and Error Modes

7. Implementation and Replication Considerations

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research