Papers
Topics
Authors
Recent
2000 character limit reached

Factuality-Aware Alignment in LLMs

Updated 29 December 2025
  • Factuality-aware alignment is defined as techniques that integrate fine-grained, fact-level supervision into LLM training to boost factual accuracy and reduce hallucinations.
  • It employs methods like atomic fact extraction, reward modeling, and uncertainty-based alignment to optimize each sentence or claim within a model's output.
  • Empirical results show significant gains in factual consistency and out-of-domain robustness, marking a notable improvement over traditional alignment strategies.

Factuality-aware alignment refers to a spectrum of algorithmic strategies and adaptation pipelines for LLMs that explicitly optimize the model to produce responses that are both factually correct and resistant to hallucination, particularly in knowledge-intensive and open-ended tasks. Factuality-aware alignment extends and sharpens generic alignment protocols (e.g., RLHF, supervised fine-tuning, DPO) by injecting fine-grained, fact-oriented signals into the loss function, data sampling, model architecture, or training regimen. Key technical directions include reward modeling for factual consistency, atomic fact-level preference annotation, uncertainty-based weighting, layer-wise surgical fine-tuning, integration of structured knowledge sources, and RL pipelines incorporating factuality verification, among others.

1. Motivations and Scope

LLMs are susceptible to hallucination: the generation of fluent but unfounded or incorrect claims. Conventional alignment, whether via preference optimization (e.g., DPO), RLHF, or instruction-tuning, frequently prioritizes general helpfulness, informativeness, or user preference, often at the cost of factual precision. Factuality-aware alignment addresses this by targeting structural deficiencies in standard protocols:

  • Generic RL rewards and response-level preferences fail to distinguish between correct and incorrect facts mixed in the same output, introducing update noise (Gu et al., 4 Mar 2025).
  • Supervised fine-tuning on human responses can inadvertently inject novel hallucinatory “facts” into the model when humans assemble answers beyond the pretraining corpus (Lin et al., 2 May 2024).
  • Naive methods struggle with cross-domain and out-of-distribution factuality, with performance either stagnating (under-alignment) or degrading (over-alignment) due to insufficiently granular supervision (Yuan et al., 18 Jun 2024).

Hence, factuality-aware alignment aims to yield models whose predictions are aligned not only with generic instructional intent but also with verifiable, contextually grounded factual references.

2. Fine-Grained, Fact-Aware Optimization Objectives

Recent methods instantiate factuality-aware alignment through explicit, fine-grained supervision at the atomic (fact, sentence, reasoning step) level:

  • Sentence- and Fact-Masked Optimization: Mask-DPO isolates factual sentences within outputs, updating the model only on these, and similarly isolates hallucinated content within rejected outputs (Gu et al., 4 Mar 2025). This resolves the label ambiguity of response-level preference by directly supervising at the unit that factuality is defined (typically a sentence or atomic claim).
  • Atomic Preference Enhanced Tuning: APEFT decomposes long-form responses into atomic factual statements, identifies those partially known by the base model, and synthesizes paired preferences between correct and incorrect atomics for joint DPO training (Yuan et al., 18 Jun 2024).
  • Dual-Fact and Hierarchical Objectives: FactAlign combines standard preference-based KTO with a sentence-level alignment loss (fKTO), directly optimizing the factual precision of each constituent segment of a response while maintaining global instruction quality (Huang et al., 2 Oct 2024). KLCF jointly learns recall (fact coverage from a base model-derived checklist) and precision (truthfulness via a self-assessment scorer) to balance completeness and reliability in long-form factual outputs (Li et al., 28 Sep 2025).
  • Reward Shaping for Factual Consistency: RLFC and FSPO interleave token-wise factuality verification within RL loops—either using NLI-based entailment of the response against a reference (RLFC) or explicit external retrieval and verifier modules for each reasoning step (FSPO) (Xue et al., 2023, Li et al., 30 May 2025).
  • Uncertainty-Conditioned Alignment: UAlign incorporates explicit uncertainty metrics—confidence scores and semantic entropy—into both the reward model and the LLM’s conditioning, improving the model's coverage/refusal boundary and generalizability across domains (Xue et al., 16 Dec 2024).

3. Data Sources, Annotation, and Preference Construction

Factuality-aware alignment depends on high-resolution labeling of factual correctness and, where applicable, informativeness:

4. Reward Modeling, RL Pipelines, and Optimization Protocols

A broad taxonomy of factuality-aware objectives includes:

Protocol Reward Signal Optimization
Mask-DPO, APEFT, fKTO Sentence- or fact-level correctness DPO
RLFC, FSPO NLI/Entailment or step-level verification PPO or GRPO
UAlign Reward model scored on (x, y, confidence, entropy) PPO
KLCF Coverage (recall) + precision (self-assess) GRPO
Self-Alignment Model’s own calibrated “True”/“False” scores DPO

Several methods interleave SFT with RL or preference learning; e.g., CodeSimpleQA uses massive SFT on verified QA pairs, then applies factuality-driven RL for further gains (Yang et al., 22 Dec 2025).

Ablation studies confirm granular, atomic/sentence-level supervision drives greater factuality gains and OOD robustness than coarse response-level alignment. For example, removing fKTO in FactAlign drops factual F1 by >13 points (Huang et al., 2 Oct 2024); masking in Mask-DPO accounts for nearly a 10-point gain over vanilla DPO (Gu et al., 4 Mar 2025); atomic preferences in APEFT yield +3.45% factuality across ID/OOD splits (Yuan et al., 18 Jun 2024).

5. Empirical Gains and Generalization

Factuality-aware alignment demonstrates significant empirical improvements in both in-domain and out-of-domain evaluations, including:

  • Mask-DPO increases factuality on ANAH test from 49.19% to 77.53% (base: Llama3.1-8B-Instruct), surpassing 70B models and improving OOD FactScore by 9 points (Gu et al., 4 Mar 2025).
  • APEFT delivers +3.45% absolute accuracy across multiple OOD factuality benchmarks compared to general paragraph-level preference fine-tuning (Yuan et al., 18 Jun 2024).
  • FactAlign achieves a +24.8 percentage point f1@100 improvement on LongFact and +2.8 points in FactScore, doubling the number of correct claims without loss in precision or helpfulness (Huang et al., 2 Oct 2024).
  • KLCF provides up to +14.4 points in FactScore and a 79.3% win rate on long-form factuality (LongFact) when both recall and precision are optimized (Li et al., 28 Sep 2025).
  • FSPO reduces hallucinations (+20-35 pp on QA benchmarks) while raising Pass@1 accuracy in math and stepwise reasoning tasks by up to 39.8 points (Li et al., 30 May 2025).
  • Self-alignment methods raise TruthfulQA MCQA accuracy by +20 points and long-form BioGEN FActScore by ~4 points, outperforming alternative self-consistency and inference-time editing approaches (Zhang et al., 14 Feb 2024).
  • Uncertainty-aware UAlign outperforms strong PPO and RLKF baselines on precision and out-of-domain truthfulness by +2–4 points (Xue et al., 16 Dec 2024).
  • FLAME achieves gains on both FActScore (e.g., +5.6 points for Biography) and instruction-following, avoiding the typical factual–helpfulness trade-off (Lin et al., 2 May 2024).

Ablation and scaling results indicate that increasing topic diversity in training data is more effective than the number of questions per topic for generalization (Gu et al., 4 Mar 2025) and that combining atomic and paragraph-level preferences is critical for distributional robustness (Yuan et al., 18 Jun 2024).

6. Extensions, Process Alignment, and Mechanistic Insights

Factuality-aware alignment is extensible across domains (dialogue (Xue et al., 2023), code (Yang et al., 22 Dec 2025), summarization (Dixit et al., 2023, Wan et al., 2022)), languages (with entity-level interventions improving crosslingual consistency (Liu et al., 11 Oct 2025)), architecture blocks (surgical DPO in global/factuality layers via hierarchical alignment (Zhang et al., 14 Oct 2025)), and input modalities (KG integration in ALIGNed-LLM (Nishat et al., 17 Jul 2025)).

Several works address explicit process-level alignment:

  • MR-ALIGN targets transitions in meta-reasoning chains, shrinking the gap between “reasoning hits the fact” and “answer states the fact,” via EM-estimated transition advantage weighting (Wang et al., 27 Oct 2025).
  • FSPO and MR-ALIGN both show substantial reductions in misleading or incoherent chains by re-weighting updates based on stepwise or segment-level verification signals (Li et al., 30 May 2025, Wang et al., 27 Oct 2025).

Finally, mechanistic analyses reveal that:

  • Model factuality is disproportionally controlled by upper “global” layers; surgical DPO restricted to these layers delivers net factuality gains without the logic/fluency “alignment tax” seen in full-model tuning (Zhang et al., 14 Oct 2025).
  • Topic-level and entity-level alignment (in latent graph space or multilingual conceptual manifolds) underlies both generalization and consistency, with prompt-level interventions (e.g., English subject injection in SubInj) driving measurable alignment and factual gains (Liu et al., 11 Oct 2025, Gu et al., 4 Mar 2025).

7. Limitations and Open Directions

Open questions involve:

  • The potential task–capability trade-offs from aggressive factuality tuning (e.g., code, reasoning).
  • The need for scalable, annotation-efficient atomic decomposition and verification—self-supervised and uncertainty-based signals offer promising directions (Chen et al., 14 May 2025, Xue et al., 16 Dec 2024).
  • Mechanistic interpretability: the extent to which factuality aligns with specialized subnetworks vs. distributed representations (Zhang et al., 14 Oct 2025, Yuan et al., 18 Jun 2024).
  • Frameworks for dynamic or hybrid alignment that combine internal self-consistency, uncertainty features, and external verification where available, as well as cross-modal or cross-lingual extensions.

In summary, factuality-aware alignment constitutes a fast-evolving paradigm driving the next generation of LLM reliability, characterized by fact- and process-level supervision, fine-grained reward shaping, and regularization mechanisms grounded in model knowledge boundaries, ultimately yielding models that are significantly less prone to hallucination across both standard and challenging distributional regimes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Factuality-Aware Alignment.