TI-DPO: Token-Importance Guided DPO

Updated 29 May 2026

TI-DPO is a class of methods that refines Direct Preference Optimization by emphasizing influential tokens for improved model alignment.
The approach integrates token-level discriminative weighting to adaptively modulate gradient updates and address issues like reward dilution and length bias.
By focusing on the tokens most impactful for preference outcomes, TI-DPO offers a practical solution to challenges in sequence-level preference modeling.

Token-Importance Guided Direct Preference Optimization (TI-DPO) is a class of methods designed to enhance preference alignment for LLMs by integrating token-level discriminative weighting into the standard Direct Preference Optimization (DPO) paradigm. Unlike conventional DPO, which uniformly aggregates the log-likelihood ratio over all tokens in a sequence, TI-DPO leverages fine-grained token-level signals to emphasize critical tokens and adaptively modulate gradient updates. These approaches target the shortcomings of sequence-level preference modeling, such as reward dilution, length bias, and weak discrimination at salient tokens, by focusing learning on the tokens most impactful for preference outcomes.

1. Theoretical Foundations and Motivation

Traditional DPO optimizes a policy $\pi_\theta$ to maximize the likelihood of preferred complet

Markdown Report Issue Upgrade to Chat

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Token-Importance Guided DPO (TI-DPO).