SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain (2407.19584v1)

Published 28 Jul 2024 in cs.CL

Abstract: In this paper, we introduce SauLLM-54B and SauLLM-141B, two LLMs tailored for the legal sector. These models, which feature architectures of 54 billion and 141 billion parameters, respectively, are based on the Mixtral architecture. The development of SauLLM-54B and SauLLM-141B is guided by large-scale domain adaptation, divided into three strategies: (1) the exploitation of continued pretraining involving a base corpus that includes over 540 billion of legal tokens, (2) the implementation of a specialized legal instruction-following protocol, and (3) the alignment of model outputs with human preferences in legal interpretations. The integration of synthetically generated data in the second and third steps enhances the models' capabilities in interpreting and processing legal texts, effectively reaching state-of-the-art performance and outperforming previous open-source models on LegalBench-Instruct. This work explores the trade-offs involved in domain-specific adaptation at this scale, offering insights that may inform future studies on domain adaptation using strong decoder models. Building upon SauLLM-7B, this study refines the approach to produce an LLM better equipped for legal tasks. We are releasing base, instruct, and aligned versions on top of SauLLM-54B and SauLLM-141B under the MIT License to facilitate reuse and collaborative research.

PDF HTML Abstract

An Analytical Overview of SauLLM-54B and SauLLM-141B: Scaling Up Domain Adaptation for the Legal Domain

This essay presents an in-depth summary of the paper titled "SauLLM-54B and SauLLM-141B: Scaling Up Domain Adaptation for the Legal Domain." The paper introduces SauLLM-54B and SauLLM-141B, two LLMs specifically tailored for legal applications. These models, based on the Mixtral architecture, are configured with 54 billion and 141 billion parameters respectively. The essence of the research lies in the large-scale domain adaptation aimed at enhancing the capacity of LLMs to handle complex legal texts, achieving state-of-the-art performance on the LegalBench-Instruct benchmark.

Overview of SauLLM-54B and SauLLM-141B

Model Architecture and Training Strategies

Model Architecture: SauLLM-54B and SauLLM-141B utilize the Mixtral architecture, which incorporates a Transformer structure optimized with a Mixture of Experts (MoE) framework. This methodology is notable for enhancing computational efficiency while maintaining high performance. For SauLLM-54B, the architecture includes 32 layers with a model dimension of 4096 and a hidden dimension of 14,336. SauLLM-141B expands these parameters to 56 layers, a model dimension of 6144, and a hidden dimension of 16,384.
Domain Adaptation Techniques: The domain adaptation process is divided into:
- Continued Pretraining: Leveraging an extensive legal corpus of over 540 billion tokens.
- Specialized Legal Instruction-Following Protocol: Enhances the models' instructions following capacities specifically for legal contexts.
- Alignment with Human Preferences: Fine-tunes model outputs based on synthetic data reflecting actual legal interpretations.
Pretraining Corpora and Data Processing: The base corpus amalgamates legal texts from diverse jurisdictions, augmented with publicly available datasets such as the FreeLaw subset and MultiLegal Pile. The dataset undergoes rigorous preprocessing including text normalization, rule-based and perplexity filtering, and deduplication to ensure the highest quality of training data.
Instruction Fine-Tuning and Preference Data: The models are fine-tuned with a mix of general and legal-specific instructions. Preference data, crucial for aligning the model outputs with human judgments, was synthesized to closely mimic legal reasoning processes, evaluated with models adapted for accurate factual and logical coherence.

Experimental Results and Evaluation

The empirical results demonstrate the effectiveness of the domain adaptation strategies:

Global Performance: Both SauLLM-54B and SauLLM-141B outperform previous models in the LegalBench-Instruct benchmark. SauLLM-medium (SauLLM-54B) and SauLLM-large (SauLLM-141B) surpassed competitive models including GPT-4 and Llama3-70B in tasks that require advanced legal reasoning and processing.
Impact of Continued Pretraining: Continued pretraining significantly improved model performance by approximately 7%, a consistent boost observed across all evaluated categories: conclusion, interpretation, rhetoric, rules, and issue spotting.
Preference Alignment: The preference alignment stage (DPO) notably enhanced task-specific performance. While most task categories benefited, there was an observed inverse scaling in some areas, primarily due to increased verbosity in legal interpretation tasks, highlighting the continuing challenge of accurately evaluating nuanced model outputs.
Scalability: Scaling up the model further enhanced overall performance, although inverse scaling was observed in a minority of tasks. This points to the complex dynamics of scaling LLMs across specialized domains.

Practical and Theoretical Implications

The release of SauLLM-54B and SauLLM-141B under the MIT License provides a robust platform for legal NLP research, allowing for extensive reuse and collaboration. The models establish a new benchmark for legal LLMs, expanding the ability of LLMs to perform detailed legal analysis and reasoning at an unprecedented scale. Future implementations can potentially enhance performance by further refining the domain-adaptation pipeline and integrating larger base models like LLama3-70B.

Conclusion and Future Directions

This paper represents a significant advancement in legal domain adaptation for LLMs. The methods and findings offer valuable insights into scaling domain-specific LLMs and optimizing performance through continued pretraining and preference alignment. Future work will focus on addressing limitations such as model verbosity and expanding the alignment procedures to encompass the broader NLP community, ultimately refining the models' accuracy in legal contexts. This ongoing research has the potential to substantially support the legal profession and judicial systems globally.