PoliLegalLM: A Technical Report on a Large Language Model for Political and Legal Affairs

Published 19 Apr 2026 in cs.CL | (2604.17543v1)

Abstract: LLMs have achieved remarkable success in general-domain tasks, yet their direct application to the legal domain remains challenging due to hallucinated legal citations, incomplete knowledge coverage, and weak structured reasoning. To address these issues, we propose PoliLegalLM, a domain-specific LLM tailored for political and legal applications. Our approach adopts a unified training framework that integrates continued pretraining, progressive supervised fine-tuning, and preference-based reinforcement learning to jointly enhance legal knowledge grounding, task alignment, and reasoning capability. We construct a large-scale, high-quality legal corpus and design a structured post-training pipeline, enabling the model to effectively learn domain-specific knowledge and adapt to diverse legal tasks. We evaluate PoliLegalLM on three representative benchmarks, including LawBench, LexEval, and a real-world dataset, PoliLegal. Experimental results demonstrate that PoliLegalLM achieves strong and consistent performance, outperforming competitive models of similar scale and remaining highly competitive with significantly larger models, while achieving the best results on real-world legal scenarios. These results highlight the effectiveness of our training paradigm and the practical value of domain-specific LLMs for real-world legal applications.

Abstract PDF Upgrade to Chat

Authors (10)

Summary

The paper introduces a multi-stage training framework (CPT, PSFT, HIPO) to boost legal reasoning and political task performance.
It leverages 140B tokens and 1.8M annotated samples to address challenges like hallucinated citations and outdated statutory data.
Results show PoliLegalLM-30B-A3B outperforms similar-scale models in structured legal inference and real-world evaluation benchmarks.

PoliLegalLM: A Domain-Specific LLM for Political and Legal Affairs

Motivation and Domain-Specific Challenges

While foundation LLMs have achieved high performance across general NLP tasks, their application to the legal domain presents persistent challenges: hallucinated legal citations, incomplete or outdated statutory knowledge, and fragile structured reasoning degrade their reliability in practical judicial and governance scenarios. Generic fine-tuning on modest legal datasets with standard SFT does not sufficiently bridge these domain-specific deficits, especially with respect to statutory coverage, interdependent legal concepts, and robustness to complex real-world cases. The paper "PoliLegalLM: A Technical Report on a LLM for Political and Legal Affairs" (2604.17543) proposes a unified, multi-stage domain adaptation scheme specifically designed to satisfy the formal, factual, and logical requirements for legal and political task settings.

Corpus Construction and Data Engineering

PoliLegalLM is grounded in massive, carefully engineered data. The continued pretraining (CPT) corpus contains 140B tokens, meticulously filtered and scored using a hybrid of LLM-based semantic quality solvers and lightweight scorers bootstrapped from Qwen3-0.6B. Legal domain data is accentuated using multi-dimensional legal knowledge frameworks and semi-automatic annotation pipelines, incorporating real-world judgments, statutes, academic papers, and governance logs. Statutory texts are enriched via structured annotation capturing normative rules, legal elements, conceptual boundaries, and inter-article relations. The total instruction dataset for post-training comprises over 1.8M samples, with ~30% specifically legal/governance-oriented, and significant involvement of domain experts during both annotation and format standardization.

Unified Multi-Stage Training Framework

PoliLegalLM employs a hybrid pipeline of CPT, Progressive Supervised Fine-Tuning (PSFT), and Hard Sample-aware Iterative Direct Preference Optimization (HIPO):

CPT is executed in two phases: first on 8K tokens to efficiently inject scale, followed by long-context adaptation on 16K tokens, targeting lengthy judgments and statutes.
PSFT institutes curriculum-based SFT, initially training for legal judgment prediction (charge, statute, penalty detection), producing a core model $M_{\text{core}}$ . Progressive adaptation is applied to downstream reasoning, QA, and generation tasks, controlled by a batch mixing anti-forgetting mechanism maintaining 20% core samples to mitigate catastrophic forgetting of core legal mappings.
HIPO implements iterative DPO (Direct Preference Optimization) on batches of high-difficulty queries. At each iteration, only unresolved or underperforming samples are retained. Preference pairs are formed using human- or LLM-annotated gold responses versus model outputs, augmented with an explicit auxiliary NLL target for stability and factual retention, enhancing factual consistency and structured inference in hard scenarios.
Figure 1: Multi-stage training pipeline of PoliLegalLM: continued pretraining, curriculum-style supervised fine-tuning, and hard-sample-aware preference optimization enable robust domain-specific adaptation.

Evaluation Methodology and Benchmarks

Three complementary benchmarks are used: LawBench (20 tasks for legal cognition, recall, reasoning), LexEval (23 tasks for multi-level legal cognition and ethics in Chinese), and PoliLegal (2000 real-world system queries). Tasks cover knowledge, application, logical inference, information extraction, and scenario-based risk assessment. Evaluation metrics include accuracy, F1, NLD, ROUGE-L, and soft-F1 per benchmark specifications. Baselines span both open-source and proprietary models across scales up to ultra-large mixture-of-experts systems (e.g., Qwen3-235B-A22B, GLM-5, GPT-5).

Results and Comparative Analysis

PoliLegalLM-30B-A3B consistently achieves top scores across LawBench, LexEval, and real-world PoliLegal tasks, surpassing competitor models of equivalent or much larger scales. Notably, it establishes strong generalization and application performance on key legal reasoning benchmarks, with pronounced lead margins on real-world tasks.

On LawBench, PoliLegalLM-30B-A3B attains an average score of 60.95, outperforming all similarly-sized baselines (e.g., ChatLaw-33B, Qwen3-30B-A3B, Qwen3.5-35B).
In LexEval, it reaches 62.15, closely following or surpassing much larger models (GLM-5: 65.14) and excelling in logical inference and discrimination metrics.
In the real-world PoliLegal assessment, the model achieves 87.35, with the next best at 81.25 (+6 point delta).

Notably, the model demonstrates superior breakdown on Understanding, Application, and Logical Inference tasks, reflecting enhanced competence in both factual and reasoning-intensive tasks, not merely memorization or surface patterning.

Figure 2: (a) PoliLegal-30B-A3B significantly outperforms a spectrum of open-source models in political and legal evaluations; (b) Among peer models, PoliLegalLM achieves leading scores in Understanding, Application, and Logical Inference capabilities.

Ablation: Contribution of Each Pipeline Stage

Ablation studies show monotonic gains from SFT, CPT, and RL stages. SFT brings immediate improvements due to explicit legal knowledge alignment; CPT further empowers reasoning-heavy tasks and boosts real-world transfer; RL/HIPO provides the largest additional increment in reasoning, discrimination, and ethical compliance dimensions, especially evident in LexEval and PoliLegal. Every stage is empirically indispensable for the observed state-of-the-art results.

Implications and Future Research Directions

The results establish that domain-specific LLMs, when grounded in high-coverage, structured legal corpora and subjected to hybrid anti-forgetting/progressive/task-centric preference optimization, can rival or outperform models several times their parameter count on complex, real-world legal scenarios. Importantly, this demonstrates that scale alone is not sufficient for robust domain reasoning; the synergy of targeted data curation and curriculum/reinforcement optimization is critical.

Despite these achievements, key limitations remain: persistent factuality and robustness challenges in rare or adversarial legal instances, the need for improved output verifiability and interpretability for high-stakes deployment, and difficulties in dynamic statutory updates or multi-agent legal reasoning. Extending the framework to include algorithmic verification, retrieval-augmented reasoning, and agentic frameworks are promising directions for further research.

Conclusion

PoliLegalLM represents a substantive advance in domain-specific large language modeling for legal and governance tasks. Through integrated corpus engineering, curriculum-aligned SFT, and robust hard-sample preference learning, it exhibits strong and consistent superiority across legal knowledge, structured reasoning, and real-world deployment dimensions. The architecture substantiates the viability of mid-size, robustly adapted LLMs for critical legal-AI workflows and sets a precedent for subsequent efforts in both methodology and evaluation for high-assurance domain-specific LLMs.

Markdown Report Issue