CoT-RAFT Fine-Tuning Objective

Updated 7 November 2025

The paper introduces a composite loss that integrates regression-aware calibration with chain-of-thought cross-entropy to address numeric prediction and reasoning alignment.
It employs a two-stage training process: initial fine-tuning with external CoTs followed by alignment using self-generated CoTs to mitigate distribution mismatch.
Empirical validation shows significant robustness and improved Pearson correlation scores compared to standard methods, confirming the method's effectiveness.

The CoT-RAFT fine-tuning objective denotes a class of training paradigms that synthesize Chain-of-Thought (CoT) reasoning supervision with regression-aware or ranking-based learning signals, with particular relevance for LLM-as-a-judge and robust domain adaptation tasks. Originally motivated by the misalignment between standard cross-entropy (CE) loss and the requirements of numeric prediction or faithful stepwise reasoning, CoT-RAFT objectives have evolved to combine CE for rationales and scores, regression-calibrated loss terms, as well as staged or contrastive strategies, validated in empirical and ablation studies (Chiang et al., 6 Mar 2025).

1. Foundations of CoT-RAFT Objective

CoT-RAFT arises as a solution to two distinct but related gaps in LLM fine-tuning for judging, scoring, or complex reasoning tasks:

Standard Practice Limitation: Most supervised fine-tuning employs cross-entropy loss for maximizing the likelihood of annotated outputs—typically concatenated CoT rationales and illustrated scores (e.g., 1–5 rating). This penalizes all categorical mispredictions equally, disregarding numerical structure (i.e., the cost of predicting 5 when 1 is correct equals that of predicting 2 when 1 is correct).
Numeric Calibration Deficiency: Since LLM outputs form distributions over string-encoded scores, ignorance of inter-label distances produces poorly calibrated outputs.
Reasoning Faithfulness: Existing regression-aware training often omits CoT supervision, undermining the traceability and calibration of model judgments.

The CoT-RAFT objective directly incorporates both the regression-aware and reasoning-aware requirements into a composite loss function, with structured two-stage training to mitigate distributional mismatch between annotator and self-generated reasoning.

2. Mathematical Formulation

Let $x$ be the input, $s$ the CoT reasoning trace, $y \in \mathcal{Y} \subset \mathbb{R}$ the score, and $p(\cdot|x)$ the LLM output distribution.

Regression-Aware Loss (RAFT)

The regression-aware loss for numeric prediction utilizes the RAIL predictor:

$\hat{y}_{\rm RAIL}(x) = \sum_{y \in \mathcal{Y}} p(\text{str}(y) \mid x) \cdot y$

$\ell_{\rm RAFT}(y^*, p) = (\hat{y}_{\rm RAIL}(x) - y^*)^2$

Chain-of-Thought Cross-Entropy Loss

CoT supervision relies on the categorical likelihood of the annotated rationale-plus-score:

$\ell_{\rm CoT}(y^*, s^*, p) = -\log p([s^*, y^*] \mid x)$

Composite CoT-RAFT Loss

The core innovation combines both elements into a unified training objective:

$\boxed{ \begin{align} \ell_{\rm CoT\text{-}RAFT}^\lambda(y^*, p_t, p) = \lambda \left( \sum_{y \in \mathcal{Y}} p(\text{str}(y) \mid [x, \hat{s}]) \cdot y - y^* \right)^2 - \log p([\hat{s}, y^*] \mid x),\qquad \hat{s} \sim p_t(\cdot \mid x) \end{align} }$

where $\lambda$ is a weighting coefficient, and the conditioning on $[x, \hat{s}]$ requires explicit access to the reasoning trace at each training step.

3. Two-Stage Training Dynamics

The CoT-RAFT framework employs a two-stage procedure:

Stage 1 (Annotator CoT Training): Initialize from base LLM ( $p_0$ ), fine-tune on externally annotated CoTs (e.g., GPT-4-generated), optimizing the composite loss.
Stage 2 (Self-Generated CoT Alignment): Revert to $p_0$ , generate CoTs on all $x$ with the Stage 1 model, constructing a dataset where the reasoning aligns with the model’s own generation at inference. Fine-tune again with the CoT-RAFT loss, now conditioning on self-generated rationales.

This sequence is motivated by the observation that direct fine-tuning on external CoTs degrades generalization, whereas retraining on self-generated CoTs aligns training/inference distributions, sharply improving calibration (Chiang et al., 6 Mar 2025). Crucially, Stage 2 starts from the seed model, not the Stage 1 checkpoint; ablation studies show performance collapse otherwise.

4. Empirical Validation and Performance Analysis

In LLM-as-a-judge benchmarks, TRACT—implementing the CoT-RAFT objective—delivers state-of-the-art performance:

Pearson correlation: On Mistral-7B, TRACT achieves average $r = 0.650$ , surpassing standard CoT decoding (0.557), RAFT (0.557), and strong baselines (Prometheus-2-7B at 0.591).
Robustness: Gains persist under limited compute (few CoT samples) and across weightings ( $\lambda$ ), evidencing stable optimization.
Ablations:
- Self-CoT fine-tuning is critical: Training only on annotator CoTs reduces Pearson’s $r$ by 0.094.
- Removing regression-aware loss decreases $r$ by 0.033.
- Training on self-CoTs with only CE loss performs worse than external CoTs (0.521 vs. 0.557), highlighting the necessity of simultaneous regression-aware and CoT supervision.

Models generalized equally well to training and test CoTs when trained via the full two-stage process.

CoT-RAFT formalizes the joint supervision of reasoning and numeric calibration in LLM scoring, distinguishing itself from:

Pure regression-aware fine-tuning (RAFT): Penalizes numeric error, but omits explanatory traces.
Vanilla CoT supervised fine-tuning: Lacks sensitivity to numeric distance in scoring.
Contrastive, multi-objective, or iterative reward-ranked RAFT extensions: May add further structure, but TRACT uniquely demonstrates large improvements for CoT+score tasks under task-relevant rubrics (Chiang et al., 6 Mar 2025).

In broader context, similar composite objectives are being explored for translation (Hu et al., 2024), multi-domain adaptation, and code repair.

6. Implementation Considerations

Conditioning on Reasoning: At inference, scoring is carried out by sampling a model-generated CoT and applying regression-aware inference (CoT-RAIL).
Choice of $\lambda$ : The regression-aware and reasoning losses can be balanced; the paper finds performance stable for $\lambda$ near 1.
Data Distribution: Self-generated CoTs must be re-aligned each time the model architecture or training corpus shifts, as distribution mismatch degrades calibration.
Resource Requirements: TRACT has practical compute requirements, as self-CoT generation is conducted only once for the offline training corpus.

7. Summary Table: TRACT Loss Components

Component	Loss Term	Role
Regression-aware	Squared error on score,	Numeric calibration
	conditioned on CoT
CoT Cross-Entropy	Likelihood for rationale+score	Reasoning alignment

Fine-tuning objective:

$\ell_{\rm CoT\text{-}RAFT}^\lambda(y^*, p_t, p) = \lambda \left( \sum_{y \in \mathcal{Y}} p(\text{str}(y) \mid [x, \hat{s}]) \cdot y - y^* \right)^2 - \log p([\hat{s}, y^*] \mid x)$

8. Concluding Perspective

The CoT-RAFT fine-tuning objective, as implemented in TRACT (Chiang et al., 6 Mar 2025), successfully integrates regression-aware supervision and stepwise reasoning in a principled, empirically validated framework for LLM-based automated judgment. The two-stage process, balancing numeric calibration and reasoning faithfulness, eliminates miscalibration, mitigates distributional mismatch, and achieves superior performance across scoring and explanation metrics. Ablations confirm each component’s necessity; TRACT stands as the reference method for applications demanding interpretable, accurately scored model judgments.

PDF Markdown Chat (Pro)

References (2)

TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge (2025)

Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to CoT-RAFT Fine-tuning Objective.

CoT-RAFT Fine-Tuning Objective

1. Foundations of CoT-RAFT Objective

2. Mathematical Formulation

Regression-Aware Loss (RAFT)

Chain-of-Thought Cross-Entropy Loss

Composite CoT-RAFT Loss

3. Two-Stage Training Dynamics

4. Empirical Validation and Performance Analysis

6. Implementation Considerations

7. Summary Table: TRACT Loss Components

8. Concluding Perspective

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CoT-RAFT Fine-Tuning Objective

1. Foundations of CoT-RAFT Objective

2. Mathematical Formulation

Regression-Aware Loss (RAFT)

Chain-of-Thought Cross-Entropy Loss

Composite CoT-RAFT Loss

3. Two-Stage Training Dynamics

4. Empirical Validation and Performance Analysis

5. Integration with Related Fine-Tuning Paradigms

6. Implementation Considerations

7. Summary Table: TRACT Loss Components

8. Concluding Perspective

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research