Kahneman–Tversky Optimization (KTO)
- Kahneman–Tversky Optimization (KTO) is a method that integrates Prospect Theory into LLM fine-tuning, emphasizing nonlinear risk sensitivity and loss aversion.
- It generalizes Direct Preference Optimization by replacing logistic loss with a prospect-theoretic utility-based objective, enhancing performance under data scarcity and imbalance.
- KTO has practical applications in safety alignment, multi-agent coordination, federated learning, and tool use, driving improvements in sample efficiency and overall model robustness.
Kahneman–Tversky Optimization (KTO) is a preference-based fine-tuning method for LLMs that grounds its objective in Prospect Theory, bringing risk sensitivity, loss aversion, and reference dependence into model alignment. Originating with Ethayarajh et al. (ICML 2024), and now widely applied in LLM safety alignment, tool use, multi-agent coordination, federated learning, and beyond, KTO generalizes Direct Preference Optimization (DPO) by replacing the standard pairwise logistic loss with a prospect-theoretic utility-based objective. According to Prospect Theory, agents do not value outcomes in absolute terms but perceive utility as nonlinear, with losses weighed more heavily than gains and evaluated relative to a status-quo reference. KTO injects these human-like asymmetries into LLM optimization, enabling efficient alignment from binary preference signals and robust model adaptation under data scarcity, annotation heterogeneity, and class imbalance.
1. Theoretical Foundations and Prospect-Theoretic Objective
KTO draws deeply from Prospect Theory, as developed by Kahneman and Tversky, introducing two critical nonlinearities absent from classical expected utility:
- Value Function: A piecewise curve , concave for gains () and convex (and typically steeper) for losses (), with a loss aversion coefficient . Canonical forms (see (Ethayarajh et al., 2024)):
with controlling curvature (diminishing sensitivity).
- Probability Weighting: A function that overweights small probabilities and underweights large ones, with a common instantiation (Prelec’s law):
In the KTO context, "outcomes" are model completions for prompt , valued by their relative log-probability under the current policy versus a reference , and "events" are preference or safety labels. The expectation of prospect-theoretic utility is then computed over these sample-label pairs, aligning model outputs with human-like gain/loss perception.
2. Formal Definition, Loss Functions, and Optimization
KTO is formulated for both pairwise and unary (single-label) preference settings.
Pairwise (Preference) Formulation
Given tuples ("preferred"/"rejected"), the central margin is . The basic KTO loss (exemplified by (Shuieh et al., 9 May 2025, Viswanadha et al., 23 Jun 2025, Liu et al., 2024, Zhai, 2024)) is:
where is the sigmoid, the value function, the probability-weighting, and a temperature scale. For and , typical parameterizations are:
- for , for , with ,
- for .
If hyperparameters , KTO reduces to DPO.
Unary/Binary-Label (Prospect-RLHF/Single-Response) Formulation
In the broader single-label setting (see (Ethayarajh et al., 2024, Spadea et al., 20 Feb 2025, Zhai, 2024, Ye et al., 24 Jan 2025, Fang et al., 2 Dec 2025)), KTO operates on where , defining per-example log-ratio and a reference baseline (mean KL divergence over batch or dataset):
Weights encode loss aversion (generally ).
Algorithmic Pseudocode
A typical KTO training loop (Ye et al., 24 Jan 2025, Viswanadha et al., 23 Jun 2025, Garg et al., 2024):
1 2 3 4 5 6 7 8 9 10 11 |
for epoch in range(num_epochs): for (x, y, label) in minibatch: r = log_pi_theta(y|x) - log_pi_ref(y|x) z0 = mean_KL_divergence_in_batch() if label == "desirable": v = lambda_D * sigmoid(beta * (r - z0)) else: v = lambda_U * sigmoid(beta * (z0 - r)) loss = lambda_l - v backprop(loss) optimizer.step() |
If the task demands risk adaptive tuning (e.g., category- or difficulty-specific margin scaling), is set per-example, as in DynamicKTO (Wang et al., 25 Jul 2025).
3. Comparison with DPO, SFT, and Other Preference-Based Methods
KTO generalizes DPO by explicitly modeling nonlinear risk sensitivity and reference dependence, and accommodating single-label data:
- DPO (Shuieh et al., 9 May 2025, Ethayarajh et al., 2024, Viswanadha et al., 23 Jun 2025): operates on paired preferences, optimizes a logistic surrogate . Represents special case of KTO with linear , .
- Supervised Fine-Tuning (SFT): lacks direct use of negative preferences and cannot encode risk asymmetry.
- Other variants: IPO, SMO–Aug, ORPO, Step-DPO; all require strong preference pairing and may be less robust under data sparsity, non-IID distributions, or feedback streams (Garg et al., 2024, Ye et al., 24 Jan 2025, Spadea et al., 20 Feb 2025).
KTO provides strong sample efficiency, notably for federated learning (Spadea et al., 20 Feb 2025, Spadea et al., 14 Oct 2025), guardrail enforcement (Garg et al., 2024), and safety-critical domains (Lim et al., 18 Feb 2025, Nghiem et al., 3 Dec 2025), and outperforms DPO in low-resource, unpaired, or highly imbalanced settings.
4. Practical Applications and Empirical Results
Domain Summaries
- Tool Learning & Error Correction: HiTEC-KTO incorporates KTO into error-aware tool call optimization; KTO delivers point improvements in F1-Name and F1-Param on diverse benchmarks (Cui et al., 28 May 2025).
- Logical Form Translation: KTO reduces syntax errors and increases logical accuracy in translating NL to FOL, outperforming DPO and SFT by F1 points (Viswanadha et al., 23 Jun 2025).
- E-Commerce and User Behavior: In ADORE, KTO alignment reduces false negatives for relevance, boosting online CTR and ad revenue by (Fang et al., 2 Dec 2025).
- Federated Personalization: KTO enables communication-efficient, robust, privacy-preserving preference learning, maintaining gains even in non-IID or redistributed settings (Spadea et al., 20 Feb 2025, Spadea et al., 14 Oct 2025).
- Safety Alignment: SFT+KTO achieves toxicity reduction in low-resource safety alignment, with far lower false positive rates than SFT+DPO (Lim et al., 18 Feb 2025).
- Video Multi-Task Reasoning: KTO bridges SFT and RL, sharply improving accuracy and resource efficiency in video agents (Zhang et al., 24 Mar 2026).
- Multi-Agent Language Games: MaKTO’s KTO delivers –$23$\% win rate improvement versus baseline RL/LLM policies (Ye et al., 24 Jan 2025).
- Geospatial Hallucination Mitigation: DynamicKTO surpasses static KTO variants with +29.6% macro-average score in factuality (Wang et al., 25 Jul 2025).
- Continuous Post-Training: Two-stage SFT→KTO pipelines yield additive gains for small LMs and code LMs, especially when KTO is applied with strong negative sampling (Zhai, 2024, Liu et al., 2024).
Selected quantitative outcomes are summarized below:
| Scenario | KTO Gain over Baseline | Reference |
|---|---|---|
| CTR in E-commerce | +0.98% vs +0.48% (w/ KTO) | (Fang et al., 2 Dec 2025) |
| Safety: Singlish Toxicity | 99% reduction, FPR 1% | (Lim et al., 18 Feb 2025) |
| Logic translation (FOL) | +2–5 pp “correct,” –5–15 syntax err | (Viswanadha et al., 23 Jun 2025) |
| Tool-calling F1-Param | 60.66 → 69.31 (Qwen2.5-1.5B) | (Cui et al., 28 May 2025) |
| Geospatial accuracy | 0.3748 → 0.4858 (DynamicKTO) | (Wang et al., 25 Jul 2025) |
5. Implementation Considerations and Limitations
- Hyperparameters: Loss aversion (), curvature (), and probability weighting () are seldom fully documented in downstream KTO applications; defaults match Prospect Theory (e.g., , , per (Ethayarajh et al., 2024)).
- Reference Policy: Chosen as SFT checkpoint, pretrained LLM, or mixed policy, with KL-divergence typically estimated on-batch.
- Batch Sizes and Stability: Small batches can introduce high baseline noise; larger batches or running averages are advised (Garg et al., 2024).
- Learning Rates: KTO requires significantly lower learning rates () for stability, due to the increased sensitivity of logistic/prospect-based losses (Garg et al., 2024, Zhai, 2024).
- Negative Sampling: Strong preference and large performance gaps between positive/negative examples are necessary for maximized KTO effectiveness in code and agentic settings (Liu et al., 2024, Ye et al., 24 Jan 2025).
- Labeling Flexibility: KTO excels when unary (single-output) judgments are easier, or when preference data are highly imbalanced or non-IID (Spadea et al., 20 Feb 2025, Ye et al., 24 Jan 2025).
Key limitations as reported include instability on code tasks without curated negative mining (Liu et al., 2024), dependence on band-limited KL baselines, and moderate sensitivity to hyperparameter mis-tuning or class imbalance. In federated and guardrail alignment, KTO is robust to data redistribution and supports incremental, privacy-aware adaptation (Spadea et al., 20 Feb 2025, Garg et al., 2024).
6. Extensions, Variants, and Recent Innovations
- DynamicKTO: Category-conditional or sample-adaptive risk scale for domain complexity (e.g., entity/relation/attribute in geospatial hallucination tasks), yielding superior performance to hand-tuned static (Wang et al., 25 Jul 2025).
- KTO-S: Augments the loss with signed KL regularization, directly enforcing contraction between trained and reference policies, improving training stability and convergence (Lim et al., 18 Feb 2025).
- Federated KTO: Implements KTO in federated settings (FedFLARKO, KTOO/KTOR), supporting heterogeneous client data and communication efficiency (Spadea et al., 14 Oct 2025, Spadea et al., 20 Feb 2025).
- Multi-Agent KTO (MaKTO): Iterates KTO with in-context exploration and fine-grained, stepwise preference extraction over multi-agent dialogues (Ye et al., 24 Jan 2025).
A recurrent theme is the ease of composing KTO with SFT and DPO in multi-stage pipelines (e.g., SFT→KTO→DPO or SFT→KTO→RL), leveraging the strengths of distributional pre-alignment and prospect-aware adaptation.
7. Empirical Robustness, Sample Efficiency, and Generalization
KTO consolidates its value as a general, sample-efficient alignment method. It maintains or exceeds DPO performance even under:
- Severe class imbalance or data thinning (retaining up to 90% fewer positive examples, (Ethayarajh et al., 2024))
- Absence of paired preferences, or when negatives are unpaired/unbatched (Spadea et al., 20 Feb 2025)
- Spurious correlation regimes (mathematical reasoning, distributional narrowness, (Shuieh et al., 9 May 2025))
- Heterogeneous, privacy-restricted FL scenarios (Spadea et al., 14 Oct 2025)
- Low-resource safety and multi-lingual settings (Lim et al., 18 Feb 2025)
In aggregate, KTO matches or outperforms conventional RLHF surrogates in open-domain, safety-critical, multi-agent, and low-resource tasks, especially when feedback is naturally unary or preference annotation is costly.
References: For foundational formalism and comparative experiments across model scales, see Ethayarajh et al., "Model Alignment as Prospect Theoretic Optimization" (Ethayarajh et al., 2024). For major downstream variants and benchmark applications, see (Cui et al., 28 May 2025, Fang et al., 2 Dec 2025, Viswanadha et al., 23 Jun 2025, Zhang et al., 24 Mar 2026, Ye et al., 24 Jan 2025, Wang et al., 25 Jul 2025, Spadea et al., 14 Oct 2025, Lim et al., 18 Feb 2025), among others.