PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation? (2403.13681v2)

Published 20 Mar 2024 in cs.CL, cs.AI, and cs.LG

Abstract: In this paper, we present Paramanu-Ayn, a collection of legal LLMs trained exclusively on Indian legal case documents. This 97-million-parameter Auto-Regressive (AR) decoder-only model was pretrained from scratch with a context size of 8192 on a single GPU for just 185 hours, achieving an efficient MFU of 41.35. We also developed a legal domain specialized BPE tokenizer. We evaluated our model using perplexity and zero-shot tasks: case judgment prediction with explanation and abstractive case summarization. Paramanu-Ayn outperformed Llama-2 7B and Gemini-Pro in case judgment prediction with explanation task on test accuracy by nearly 2 percentage points, despite being 72 times smaller. In zero-shot abstractive summarization, it surpassed decoder-only LLMs generating fixed-length summaries (5000 tokens) by over 10 percentage points in BLEU and METEOR metrics, and by nearly 4 percentage points in BERTScore. Further evaluations on zero-shot commonsense and mathematical benchmarks showed that Paramanu-Ayn excelled despite being trained exclusively on legal documents, outperforming Llama-1, Llama-2, and Falcon on AGIEVAL-AQuA-RAT and AGIEVAL-SAT-Math tasks. We also instruction-tuned our model on 10,763 diverse legal tasks, including legal clause generation, legal drafting, case summarization, etc. The Paramanu-Ayn-instruct model scored above 8 out of 10 in clarity, relevance, completeness, and legal reasoning metrics by GPT-3.5-Turbo. We found that our models, were able to learn drafting knowledge and generalize to draft legal contracts and legal clauses with limited instruction-tuning. Hence, we conclude that for a strong domain-specialized generative LLM (such as legal), domain specialized pretraining from scratch is more cost effective, environmentally friendly, and remains competitive with larger models or even better than adapting LLMs for legal domain tasks.

References (18)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Prashant_Dixit0/status/1772256774583672966

https://twitter.com/Mitodru/status/1842109233279782915

https://twitter.com/Mitodru/status/1770840069990912436

https://twitter.com/Mitodru/status/1829231847370526919

PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation? (2403.13681v2)

Summary

Related Papers

Tweets