Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fidelity-Enriched Contrastive Search (FECS)

Updated 9 March 2026
  • Fidelity-Enriched Contrastive Search (FECS) is a decoding algorithm that incorporates a source-aware faithfulness reward to mitigate hallucinations.
  • It optimizes token selection by blending model confidence, repetition penalty, and source similarity to balance factual accuracy with output diversity.
  • FECS demonstrates significant improvements in tasks like abstractive summarization and knowledge-grounded dialogue with higher faithfulness scores and minimal diversity loss.

Fidelity-Enriched @@@@1@@@@ (FECS) is a decoding algorithm for neural LLMs designed to address the persistent trade-off between semantic faithfulness to source information and output diversity in natural language generation. FECS extends the Contrastive Search framework by incorporating a source-aware faithfulness reward to encourage generated content that remains consistent with the input source, reducing hallucinations while maintaining lexical and semantic diversity. This method is particularly targeted at tasks such as abstractive summarization and knowledge-grounded dialogue, where model-generated hallucinations undermine factual accuracy. FECS optimizes token selection at each decoding step according to a combined objective involving model confidence, repetition penalty, and source similarity, resulting in consistent improvements in factual alignment without sacrificing output diversity (Chen et al., 2023).

1. Hallucination in Neural Text Generation and the Faithfulness–Diversity Trade-Off

Large pretrained LLMs frequently generate fluent textual outputs that are not grounded in, or may directly contradict, the provided source. In applications such as abstractive summarization or knowledge-grounded dialogue, such hallucinations degrade factual consistency. Conventional decoding strategies present a trade-off:

  • Deterministic methods (e.g., greedy, beam search) maximize likelihood but often result in repetitiveness or generic phrasing, leading to poor diversity.
  • Stochastic sampling methods (top-kk, nucleus sampling) improve diversity but permit off-topic or unsupported content, increasing hallucination risk.

This tension arises because enforcing high model-confidence in output selection often diminishes diversity, while seeking diversity can reduce semantic alignment with the source. FECS addresses this trade-off by supplementing a diversity-preserving base (Contrastive Search) with an explicit faithfulness reward, biasing token selection towards semantic similarity with the source context (Chen et al., 2023).

2. Algorithmic Formulation of FECS

FECS augments the Contrastive Search mechanism by introducing a third term to the scoring function that measures source faithfulness. At each generation step tt, let the input prefix be x0:c+t=[x0,,xc,xc+1,,xc+t1]x_{0:c+t} = [x_0,\dots,x_c, x_{c+1},\dots,x_{c+t-1}], where tokens decompose as:

  • [x0,,xs1][x_0,\dots,x_{s-1}]: prompt tokens
  • [xs,,xc1][x_s,\dots,x_{c-1}]: source tokens (to which generation must be faithful)
  • [xc,,xc+t1][x_c,\dots,x_{c+t-1}]: tokens generated so far

For α,β0\alpha, \beta \ge 0 and kk denoting the candidate pool size, at each step, select the next token as:

xc+t=argmaxvVk(1αβ)logpLM(vx0:c+t1)αmaxcjc+t1  sim(hv,hxj)+βmaxsjc1  sim(hv,hxj)x_{c+t} = \arg\max_{v \in V_k} (1-\alpha-\beta)\,\log p_{\text{LM}}(v \mid x_{0:c+t-1}) - \alpha\,\underset{c \le j \le c+t-1}{\max}\; \text{sim}(h_v, h_{x_j}) + \beta\,\underset{s \le j \le c-1}{\max}\; \text{sim}(h_v, h_{x_j})

where:

  • VkV_k: top-kk candidates by model probability,
  • hvh_v, hxjh_{x_j}: final-layer hidden-state embeddings for candidate vv and token xjx_j,
  • sim(u,v)\text{sim}(u,v): cosine similarity,
  • (1αβ)(1-\alpha-\beta): weight for model confidence,
  • α\alpha: weight for repetition (degeneration) penalty,
  • β\beta: weight for source faithfulness reward.

The weights form a convex combination if α+β1\alpha+\beta\le1. The additional β\beta-weighted term incentivizes the model to generate tokens closer, in embedding space, to any source token, thereby mitigating hallucinations (Chen et al., 2023).

Pseudocode summary:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Input: LM, prefix x[0:c]=[prompt, source], max_len T, (k, α, β)
Output: generated continuation x[c : c+T-1]
generated = []
for t in 1..T:
    p(v | x[0:c-1] ++ generated)
    V_k = top-k tokens by p(·)
    for each v ∈ V_k:
        score_conf = log p(v)
        score_deg = max_{u in generated} cos(h_v, h_u)
        score_fth = max_{s in source_tokens} cos(h_v, h_s)
        FECS_score[v] = (1 - α - β)*score_conf
                        - α*score_deg
                        + β*score_fth
    v* = argmax_v FECS_score[v]
    Append v* to generated
return generated

3. Experimental Methodology

Experiments evaluate FECS on two tasks prone to hallucination:

  • Abstractive Summarization: Using the CNN-DailyMail dataset.
  • Knowledge-Grounded Dialogue: Using Wizard of Wikipedia ("WoW").

LLMs of varying sizes were used: OPT (1.3B, 2.7B, 6.7B parameters) for summarization and GPT-Neo (1.3B, 2.7B), GPT-J (6B) for dialogue. Each prompt incorporated two few-shot examples without further finetuning. Baseline decoders included greedy, beam (beam=4), nucleus sampling (p=0.95), and Contrastive Search (k=4, α=0.6\alpha=0.6) (Chen et al., 2023).

Standard evaluation metrics encompassed:

  • Quality: ROUGE-1/2/L, BERTScore (summarization); BLEU-4, ROUGE-L, BERTScore (dialogue).
  • Faithfulness: FEQA (summarization), Q2 (dialogue).
  • Diversity: 1Rep-n(x)1 - \text{Rep-n}(x), with Rep-n(x)=1unique n-gramstotal n-grams\text{Rep-n}(x) = 1 - \frac{|\text{unique n-grams}|}{|\text{total n-grams}|}.

4. Quantitative and Qualitative Results

FECS delivers consistent improvements in faithfulness scores (FEQA/Q2) while maintaining or minimally impacting diversity and standard quality metrics. On CNN-DailyMail, FEQA increases by 21.8% (1.3B), 19.2% (2.7B), and 27.6% (6.7B), with only marginal decreases in diversity rate (5.0%, 0.2%, 1.1%). For WoW, Q2 improves by 26.2%, 63.9%, and 63.6% (corresponding to model size), with diversity reduction of −35%, −11.2%, and −3.3% respectively (Chen et al., 2023).

Task/Dataset Faithfulness Gain (FEQA/Q2) Diversity Change
CNN-DM 1.3B +21.8% −5.0%
CNN-DM 6.7B +27.6% −1.1%
WoW 2.7B +63.9% −11.2%

Qualitative analysis highlights that FECS can recover factual content missed by other decoders. For instance, in summarization, FECS-generated summaries included all salient information from the source, whereas Contrastive Search omitted details and introduced hallucinated entities. FECS ensures low n-gram repetition across scales, on par with Contrastive Search and significantly below greedy/beam (Chen et al., 2023).

5. Hyperparameterization and Ablation

Typical FECS operation employs (k,α,β)=(4,0.3,0.3)(k, \alpha, \beta) = (4, 0.3, 0.3) without additional tuning, while Contrastive Search uses (k,α)=(4,0.6)(k, \alpha) = (4, 0.6). Comparison reveals that lowering α\alpha alone in Contrastive Search does not produce comparable faithfulness gains; the explicit inclusion of β\beta (faithfulness reward) is essential for FECS's improvements. Decoding overhead for FECS is modestly higher than that for greedy or beam but remains similar to Contrastive Search (Chen et al., 2023).

6. Limitations and Prospects

FECS presumes the source segment is reliable; if the source contains errors or contradictions, the faithfulness term may amplify these inaccuracies. Standard faithfulness metrics (e.g., FEQA, Q2) quantify factual alignment but do not assess implicit or nuanced semantic consistency. Potential directions include modeling source uncertainty, extending FECS to other text generation regimes (e.g., machine translation, data-to-text), and jointly or adaptively tuning hyperparameters (α,β)(\alpha, \beta) based on validation data or per-example characteristics (Chen et al., 2023). A plausible implication is that FECS’s explicit source anchoring may prove broadly beneficial for tasks requiring grounded generation, provided appropriate handling of source reliability is incorporated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fidelity-Enriched Contrastive Search (FECS).