LEAD: Legal Agentic CoT Distillation

Updated 7 February 2026

The paper presents LEAD as a supervised fine-tuning method that extracts multi-hop legal reasoning from authentic judicial records using an agentic workflow.
It decomposes complex legal tasks into distinct, structured sub-processes, ensuring factual grounding and alignment with real courtroom protocols.
LEAD enhances legal LLM performance by reducing citation hallucinations and enforcing logical, multi-step legal inference based on real-world standards.

Legal Agentic CoT Distillation (LEAD) is a supervised fine-tuning methodology specifically developed within the LegalOne project to reliably transfer rigorous, multi-step legal reasoning into LLMs. LEAD augments traditional chain-of-thought (CoT) distillation with an agentic workflow that decomposes complex judicial decision-making into structured, factually grounded reasoning trajectories. By harvesting and refining reasoning fragments from authentic Chinese legal documents, then enforcing logical structure and factual alignment through a multi-stage process, LEAD addresses persistent issues in legal LLMs such as hallucinated citations, omitted intermediate inferences, and failure to mirror real-world judicial protocols (Li et al., 31 Jan 2026).

1. Objectives and Motivation

Legal reasoning tasks are distinguished by their knowledge intensity—demanding precise authority grounding in statutes, case law, and commentary—and their structure intensity, which requires explicit, procedural, multi-hop inference (e.g., issue identification, rule application, syllogistic logic). General-purpose LLMs routinely generate legally plausible text yet struggle with domain-specific rigor; they often hallucinate sources, skip legally salient steps, or misrepresent the structure of judicial reasoning.

LEAD was introduced to resolve these deficits by shifting from naive teacher-student distillation to a process that (1) harvests reasoning from authentic judicial records rather than teacher generations, (2) decomposes legal tasks into agentic sub-processes aligned with real courtroom SOPs, (3) reconstructs and merges these into highly-structured CoT trajectories, and (4) imposes a rigorous, two-phase quality control mechanism using both heuristics and an LLM-as-judge. The resulting high-consistency SFT corpus explicitly grounds legal reasoning, internalizes citations, and minimizes inherited teacher errors (Li et al., 31 Jan 2026).

2. Agentic Workflow and System Overview

LEAD operationalizes legal reasoning as a network of “Agentic Processes,” each simulating a distinct judicial sub-procedure. The agentic pipeline comprises four sequential phases:

Prompt Collection: Extraction of diverse query forms from over 100,000 recent Chinese judicial decisions, including segmentation into Fact, Reasoning, and Decision components, citation linkage, and multi-perspective simulation of queries.
Agentic CoT Synthesis: A strong LLM (e.g., Qwen3-235B-A22B-Thinking) is tasked with executing, in sequence, sub-tasks such as Fact Finding, Issue Identification, Rule Retrieval, Rule Deduction, and Conclusion Derivation. Each node exposes the LLM to expert protocols, logical templates (“Major premise → Minor premise → Conclusion”), and legal KB retrieval as scaffolding, ensuring factual grounding and conformity with legal standards.
Trajectory Refinement: Sub-traces from each node undergo LLM-driven internalization (rewriting external references as model-held knowledge) and merging into a globally coherent reasoning trajectory.
Quality Control: Dual-stage filtering removes malformed or inconsistent samples and uses an LLM as judge to score outputs on dimensions such as reasoning quality and answer-reasoning alignment.

Through this workflow, LEAD produces prompt–CoT–answer triples that directly reflect the workflow of legal professionals, promoting robust internalization of legal knowledge.

3. Detailed Methodological Pipeline

3.1 Prompt Collection

Corpus Construction: >100,000 court decisions are parsed and segmented using regex heuristics. Each is rated for factual completeness/complexity by a strong LLM and subjected to stratified sampling to balance case types. Statutory citations are linked to external knowledge bases.
Question Generation: Prompts are generated by mapping document structure (e.g., Fact→Reasoning, Fact→Decision) and simulating queries from multiple legal perspectives (litigant, attorney, judge). Real consultation queries (~20k) are appended to improve alignment.

3.2 Agentic CoT Synthesis

A workflow graph $G=(V,E)$ links the main sub-processes: Fact Finding, Issue Identification, Rule Retrieval, Rule Deduction, Conclusions. For each prompt, the LLM traverses the graph:

Loads sub-task scaffolding;
Generates the sub-trace $τ_v$ ;
Retrieves relevant statutes (if required) via legal KB queries;
Iteratively appends each result to context.

Knowledge Internalization: LLM rewrites $τ_v$ statements to eliminate external citations, transforming them into assertions of internal knowledge while preserving logical intent.
Reasoning Convergence: The individual sub-traces are consolidated into a globally consistent CoT $σ$ , preserving inferential steps and enforcing logical order.

3.4 Quality Control

Stage I—Heuristic Filtering: Removes truncated, malformed, or deduplicated examples; applies code-mix ratio thresholds.
Stage II—LLM-as-Judge: Each example is scored (1–10) for reasoning quality, consistency, answer-reasoning alignment, conciseness, fluency, and pedagogical value; examples with ratings below 7 in any category are discarded.

After quality control, the resulting ~500K SFT examples are merged with ~50K high-quality open-source instructions, forming the final training dataset for supervised fine-tuning.

4. Training Objective and Implementation

The final SFT dataset consists of triples $(q_i, σ_i, a_i)$ , with chain-of-thought $σ_i = (y_{i,1}, ..., y_{i,T_i})$ and answer $a_i$ . The model $p_\theta$ is optimized with the following loss:

$\mathcal{L}_{\rm SFT}(\theta) = -\sum_{i=1}^N \sum_{t=1}^{T_i} \log p_\theta\bigl(y_{i,t}\,\bigl|\,q_i,\,y_{i,<t}\bigr) - \sum_{i=1}^N \log p_\theta(a_i|q_i, σ_i)$

Pseudocode for the overall LEAD process is structured as a pipeline: document parsing and filtering, multi-perspective prompt generation, agentic agent synthesis of sub-traces, internalization and merging, heuristic and LLM-based filtering, and final dataset composition (see (Li et al., 31 Jan 2026) for the detailed code block).

5. Worked Example: Judicial Reasoning Distillation

Consider a criminal case where Defendant A acts as a general agent, recruiting gamblers and collecting rebates:

Sub-Traces:
- τ_fact: “Since June 2017, A …”
- τ_issue: “Legal issue: Does A’s role satisfy ‘opening a casino’?”
- τ_retrieval: queries Article 303(2) CRPC.
- τ_deduction: applies multi-factor legal test (objective element: profit, subjective element: intent).
- τ_conclusion: “A is guilty under Art. 303(2), sentenced to 5 years + fine.”

After internalization and merging, the distilled chain-of-thought $σ$ is organized as follows:

Facts: Since June 2017 …
Issue: Whether serving as general agent constitutes “opening a casino.”
Rule: Art. 303(2) CRPC defines “opening a casino” as accepting bets for profit.
Application: Actions and intent are substantiated.
Conclusion: Guilty under Art. 303(2), 5 years’ imprisonment plus fine.

Paired as $τ_v$ 0 for SFT, this example illustrates how LEAD transforms raw court data into structured, explicable supervision for LLMs in legal reasoning (Li et al., 31 Jan 2026).

6. Empirical Impact

Ablation studies at the 4B parameter scale quantify the incremental effect of LEAD, mid-training, and reinforcement phases on legal benchmarks:

Method	JEC-QA	LexEval
Baseline (no SFT)	39.63	56.71
+ SFT (LEAD only)	46.02	59.69
+ Mid-train + SFT	53.83	66.11
+ Mid-train+SFT+RL	55.68	67.59

LEAD-based SFT alone improves JEC-QA by +6.4 points and LexEval by +3.0 points. Preceding LEAD with domain-tuned mid-training further boosts performance. The full three-stage LegalOne pipeline delivers the highest metrics, demonstrating LEAD’s critical contribution to state-of-the-art legal LLMs (Li et al., 31 Jan 2026).

7. Strengths, Limitations, and Prospective Directions

Strengths

Professional alignment: Structured chains-of-thought reflect established courtroom SOPs.
Error mitigation: Two-stage QC substantially filters out teacher and dataset-derived errors.
Citation internalization: Legal references become part of model knowledge, reducing citation hallucinations.
Scalability: The modular agentic workflow is amenable to adaptation and extension.

Limitations

High resource demands: Agentic synthesis and LLM-as-judge evaluation are compute-intensive.
Vendor dependence: Current reliance on a single strong LLM teacher risks propagation of model-specific biases.
Domain constraints: Methods and templates are tailored to the Chinese legal system, necessitating significant engineering for other jurisdictions.
Residual QC issues: LLM-based scoring may occasionally misjudge subtle legal reasoning.

Potential Extensions

Human-in-the-loop refinement, enabling final quality control by legal experts.
Cross-jurisdiction adaptation, expanding agentic scaffolds to different legal systems (e.g., US/EU).
Dynamic retrieval and end-to-end retrieval-augmented generation.
Continual updating of the LEAD dataset as statutes and precedents change.

LEAD represents a systematic methodology for extracting, structuring, and internalizing complex legal reasoning into high-fidelity supervision, demonstrably enhancing the performance and reliability of legal LLMs in high-stakes judicial settings (Li et al., 31 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

LegalOne: A Family of Foundation Models for Reliable Legal Reasoning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Legal Agentic CoT Distillation (LEAD).

LEAD: Legal Agentic CoT Distillation

1. Objectives and Motivation

2. Agentic Workflow and System Overview

3. Detailed Methodological Pipeline

3.1 Prompt Collection

3.2 Agentic CoT Synthesis

3.3 Trajectory Refinement

3.4 Quality Control

4. Training Objective and Implementation

5. Worked Example: Judicial Reasoning Distillation

6. Empirical Impact

7. Strengths, Limitations, and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

LEAD: Legal Agentic CoT Distillation

1. Objectives and Motivation

2. Agentic Workflow and System Overview

3. Detailed Methodological Pipeline

3.1 Prompt Collection

3.2 Agentic CoT Synthesis

3.3 Trajectory Refinement

3.4 Quality Control

4. Training Objective and Implementation

5. Worked Example: Judicial Reasoning Distillation

6. Empirical Impact

7. Strengths, Limitations, and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics