LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain (2404.02127v2)
Abstract: Instruction tuning is an important step in making LLMs useful for direct user interaction. However, the legal domain is underrepresented in typical instruction datasets (e.g., only 10 out of 1600+ tasks in Super-NaturalInstructions). To study whether instruction tuning on legal datasets is necessary for strong legal reasoning, we aggregate 58 annotated legal datasets and write instructions for each, creating LawInstruct. LawInstruct covers 17 global jurisdictions, 24 languages and a total of 12M examples across diverse tasks such as legal QA, summarization of court cases, and legal argument mining. We evaluate our models on LegalBench, measuring legal reasoning across five categories in 162 challenging and realistic legal tasks, and MMLU, to measure potential drops in general reasoning capabilities. We find that legal-specific instruction tuning on Flan-T5 - yielding FLawN-T5 - improves performance on LegalBench across all model sizes, with an aggregate increase of 15 points or 50% over Flan-T5 for the base size. No model size shows performance drops in MMLU. We publish LawInstruct as a resource for further study of instruction tuning in the legal domain.
- EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain, October 2022. URL http://arxiv.org/abs/2210.13448. arXiv:2210.13448 [cs].
- Datasets and Performance Metrics for Greek Named Entity Recognition. In 11th Hellenic Conference on Artificial Intelligence (SETN 2020), SETN 2020, pp. 160–167, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450388788. doi: 10.1145/3411408.3411437. URL https://doi.org/10.1145/3411408.3411437.
- A programming language for future interest. Yale JL & Tech., 24:75, 2022.
- SciBERT: A pretrained language model for scientific text. In Conference on Empirical Methods in Natural Language Processing, 2019.
- LegalLens: Leveraging LLMs for Legal Violation Identification in Unstructured Text, February 2024. URL http://arxiv.org/abs/2402.04335. arXiv:2402.04335 [cs].
- A comparative study of summarization algorithms applied to legal case judgments. In European Conference on Information Retrieval, pp. 413–428. Springer, 2019.
- Language Models are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- LawngNLI: A Long-Premise Benchmark for In-Domain Generalization from Short to Long Contexts and for Implication-Based Retrieval. arXiv preprint arXiv:2212.03222, 2022.
- CAIL 2022. CAIL 2022. https://github.com/china-ai-law-challenge/CAIL2022, 2022.
- Bilingual dataset for information retrieval and question answering over the spanish workers statute. In XIX Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA), 2021.
- Case briefs. Case briefs. https://www.oyez.org/, 2024.
- Large-Scale Multi-Label Text Classification on EU Legislation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6314–6322, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1636. URL https://aclanthology.org/P19-1636.
- LEGAL-BERT: The muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.261. URL https://aclanthology.org/2020.findings-emnlp.261.
- MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. arXiv:2109.00904 [cs], September 2021a. URL http://arxiv.org/abs/2109.00904. arXiv: 2109.00904.
- Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 226–241, Online, June 2021b. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.22. URL https://aclanthology.org/2021.naacl-main.22.
- LexGLUE: A benchmark dataset for legal language understanding in English. In ACL (1), pp. 4310–4330. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-long.297.
- LeXFiles and LegalLAMA: Facilitating English multinational legal language model development, 2023.
- PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311 [cs], April 2022. URL http://arxiv.org/abs/2204.02311. arXiv: 2204.02311.
- Resolving Legalese: A Multilingual Exploration of Negation Scope Resolution in Legal Documents, September 2023. URL http://arxiv.org/abs/2309.08695. arXiv:2309.08695 [cs].
- Scaling Instruction-Finetuned Language Models, October 2022. URL http://arxiv.org/abs/2210.11416. arXiv:2210.11416 [cs].
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv:2003.10555 [cs], March 2020. URL http://arxiv.org/abs/2003.10555. arXiv: 2003.10555.
- SaulLM-7B: A pioneering Large Language Model for Law, March 2024. URL http://arxiv.org/abs/2403.03883. arXiv:2403.03883 [cs].
- Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models, January 2024. URL http://arxiv.org/abs/2401.01301. arXiv:2401.01301 [cs].
- Spanish datasets for sensitive entity detection in the legal domain. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 3751–3760, Marseille, France, June 2022. European Language Resources Association. URL https://aclanthology.org/2022.lrec-1.400.
- Passing the Brazilian OAB exam: data preparation and some experiments, 2017. arXiv preprint arXiv:1712.05128.
- A Corpus for Multilingual Analysis of Online Terms of Service. In Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 1–8, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.nllp-1.1.
- Your answer is incorrect… would you like to know why? introducing a bilingual short answer feedback dataset. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8577–8591, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.587. URL https://aclanthology.org/2022.acl-long.587.
- Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthcare, 3(1), oct 2021. ISSN 2691-1957. doi: 10.1145/3458754. URL https://doi.org/10.1145/3458754.
- LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models, August 2023. URL http://arxiv.org/abs/2308.11462. arXiv:2308.11462 [cs].
- Spanish Legalese Language Model and Corpora, October 2021. URL http://arxiv.org/abs/2110.12201. arXiv:2110.12201 [cs].
- Mining Legal Arguments in Court Decisions. arXiv preprint, 2022. doi: 10.48550/arXiv.2208.06178.
- Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset, July 2022. URL http://arxiv.org/abs/2207.00220. arXiv:2207.00220 [cs].
- Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
- Measuring Massive Multitask Language Understanding, January 2021b. URL http://arxiv.org/abs/2009.03300. arXiv:2009.03300 [cs].
- CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review, November 2021c. URL http://arxiv.org/abs/2103.06268. arXiv:2103.06268 [cs].
- A dataset for statutory reasoning in tax law entailment and question answering. In NLLP@KDD, pp. 31–38, 2020. URL https://ceur-ws.org/Vol-2645/paper5.pdf.
- ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. 2019. URL http://arxiv.org/abs/1904.05342.
- A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction, October 2022. URL http://arxiv.org/abs/2206.05224. arXiv:2206.05224 [cs].
- Elias Jacob de Menezes-Neto and Marco Bruno Miranda Clementino. Using deep learning to predict outcomes of legal appeals better than human experts: A study with data from Brazilian federal courts. PLOS ONE, 17(7):e0272287, July 2022. ISSN 1932-6203. doi: 10.1371/journal.pone.0272287. URL https://dx.plos.org/10.1371/journal.pone.0272287.
- Heewon Jeon. Legalqa using sentencekobart. https://github.com/haven-jeon/LegalQA, 2021.
- Named entity recognition in Indian court judgments. In Proceedings of the Natural Legal Language Processing Workshop 2022, pp. 184–193, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.nllp-1.15. URL https://aclanthology.org/2022.nllp-1.15.
- GPT-4 Passes the Bar Exam, March 2023. URL https://papers.ssrn.com/abstract=4389233.
- Toward domain-guided controllable summarization of privacy policies. In NLLP@ KDD, pp. 18–24, 2020.
- Coliee 2022 summary: Methods for legal document retrieval and entailment. In JSAI International Symposium on Artificial Intelligence, pp. 51–67. Springer, 2022.
- ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 1907–1919, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.164. URL https://aclanthology.org/2021.findings-emnlp.164.
- BillSum: A Corpus for Automatic Summarization of US Legislation. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp. 48–56, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-5406. URL https://aclanthology.org/D19-5406.
- Validity assessment of legal will statements as natural language inference. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 6047–6056, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-emnlp.447. URL https://aclanthology.org/2022.findings-emnlp.447.
- LongForm: Effective Instruction Tuning with Reverse Instructions, February 2024. URL http://arxiv.org/abs/2304.08460. arXiv:2304.08460 [cs].
- Predicting Brazilian Court Decisions. PeerJ Computer Science, 8:e904, March 2022. ISSN 2376-5992. doi: 10.7717/peerj-cs.904. URL https://peerj.com/articles/cs-904. Publisher: PeerJ Inc.
- Law Stack Exchange. Law stack exchange. https://law.stackexchange.com/, 2024.
- BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020. ISSN 14602059. doi: 10.1093/bioinformatics/btz682.
- LegalQA. LegalQA. https://github.com/siatnlp/LegalQA, 2019.
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243.
- ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law. 2022.
- CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artificial Intelligence and Law, 27(2):117–139, 2019. ISSN 1572-8382. doi: 10.1007/s10506-019-09243-2. URL https://doi.org/10.1007/s10506-019-09243-2.
- Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models, September 2023. URL http://arxiv.org/abs/2309.17050. arXiv:2309.17050 [cs].
- LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text. In Aline Villavicencio, Viviane Moreira, Alberto Abad, Helena Caseli, Pablo Gamallo, Carlos Ramisch, Hugo Gonçalo Oliveira, and Gustavo Henrique Paetzold (eds.), Computational Processing of the Portuguese Language, Lecture Notes in Computer Science, pp. 313–323, Cham, 2018. Springer International Publishing. ISBN 978-3-319-99722-3.
- ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4046–4062, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.313. URL https://aclanthology.org/2021.acl-long.313.
- Plain English summarization of contracts. In Proceedings of the Natural Legal Language Processing Workshop 2019, pp. 1–11, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-2201. URL https://aclanthology.org/W19-2201.
- jurBERT: A Romanian BERT model for legal judgement prediction. In Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 86–94, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.nllp-1.8. URL https://aclanthology.org/2021.nllp-1.8.
- Cross-Task Generalization via Natural Language Crowdsourcing Instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3470–3487, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.244. URL https://aclanthology.org/2022.acl-long.244.
- Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation, May 2023. URL http://arxiv.org/abs/2305.16938. arXiv:2305.16938 [cs].
- Natural language processing in law: Prediction of outcomes in the higher courts of turkey. Information Processing & Management, 58(5):102684, 2021. ISSN 0306-4573. doi: https://doi.org/10.1016/j.ipm.2021.102684. URL https://www.sciencedirect.com/science/article/pii/S0306457321001692.
- BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?, November 2022. URL http://arxiv.org/abs/2211.17135. arXiv:2211.17135 [cs].
- Swiss-Judgment-Prediction: A Multilingual Legal Judgment Prediction Benchmark. In Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 19–35, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.nllp-1.3.
- LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain, January 2023a. URL http://arxiv.org/abs/2301.13126. arXiv:2301.13126 [cs].
- MultiLegalPile: A 689GB Multilingual Legal Corpus, June 2023b. URL http://arxiv.org/abs/2306.02069. arXiv:2306.02069 [cs].
- OpenAI. GPT-4 Technical Report, March 2023. URL http://arxiv.org/abs/2303.08774. arXiv:2303.08774 [cs].
- Training language models to follow instructions with human feedback, 2022. URL https://arxiv.org/abs/2203.02155.
- Named entity recognition in the Romanian legal domain. In Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 9–18, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.nllp-1.2. URL https://aclanthology.org/2021.nllp-1.2.
- Multi-granular legal topic classification on Greek legislation. In Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 63–75, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.nllp-1.6. URL https://aclanthology.org/2021.nllp-1.6.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. ISSN 1533-7928. URL http://jmlr.org/papers/v21/20-074.html.
- SCALE: Scaling up the Complexity for Advanced Language Model Evaluation, June 2023. URL http://arxiv.org/abs/2306.09237. arXiv:2306.09237 [cs].
- Question answering for privacy policies: Combining computational and legal perspectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4947–4958, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1500. URL https://aclanthology.org/D19-1500.
- EUROPA: A Legal Multilingual Keyphrase Generation Dataset, February 2024. URL http://arxiv.org/abs/2403.00252. arXiv:2403.00252 [cs].
- Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv:2110.08207 [cs], March 2022. URL http://arxiv.org/abs/2110.08207. arXiv: 2110.08207.
- ClassActionPrediction: A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the US. In Proceedings of the Natural Legal Language Processing Workshop 2022, pp. 31–46, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.nllp-1.3.
- Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities, July 2022. URL http://arxiv.org/abs/2206.10883. arXiv:2206.10883 [cs].
- Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pp. 1048–1064, 2022.
- Supreme Court Database, Version 2020 Release 01, 2020.
- The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, May 2006. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2006/pdf/340_pdf.pdf.
- Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions. In Proceedings of WWW, 2016.
- Unifying Language Learning Paradigms, May 2022. URL http://arxiv.org/abs/2205.05131. arXiv:2205.05131 [cs].
- A summary of the alqac 2021 competition. In 2021 13th International Conference on Knowledge and Systems Engineering (KSE), pp. 1–5, 2021. doi: 10.1109/KSE53942.2021.9648724.
- LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts. In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1235–1241, Marseille, France, May 2020. European Language Resources Association. ISBN 979-10-95546-34-4. URL https://aclanthology.org/2020.lrec-1.155.
- A multilingual approach to identify and classify exceptional measures against covid-19. In Proceedings of the Natural Legal Language Processing Workshop 2021, pp. 46–62, 2021. Dataset URL: https://tinyurl.com/ycysvtbm.
- Design and Implementation of German Legal Decision Corpora:. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence, pp. 515–521, Online Streaming, — Select a Country —, 2021. SCITEPRESS - Science and Technology Publications. ISBN 978-989-758-484-8. doi: 10.5220/0010187305150521. URL https://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0010187305150521.
- Globalcit citizenship law dataset. 2021.
- Automatic classification of rhetorical roles for sentences: Comparing rule-based scripts with machine learning. ASAIL@ ICAIL, 2385, 2019.
- SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. pp. 30, 2019.
- MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding, November 2023a. URL http://arxiv.org/abs/2301.00876. arXiv:2301.00876 [cs].
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, October 2022. URL http://arxiv.org/abs/2204.07705. arXiv:2204.07705 [cs].
- How far can camels go? exploring the state of instruction tuning on open resources, 2023b.
- Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13484–13508, Toronto, Canada, July 2023c. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.754. URL https://aclanthology.org/2023.acl-long.754.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=gEZrGCozdqR.
- Finetuned Language Models Are Zero-Shot Learners, February 2022b. URL http://arxiv.org/abs/2109.01652. arXiv:2109.01652 [cs].
- Benjamin Weiser. Here’s what happens when your lawyer uses chatgpt. New York Times, may 2023. URL https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html.
- The creation and analysis of a website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1330–1340, 2016.
- CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction. arXiv:1807.02478 [cs], July 2018. URL http://arxiv.org/abs/1807.02478. arXiv: 1807.02478.
- Cail2019-scm: A dataset of similar case matching in legal domain. arXiv preprint arXiv:1911.08962, 2019.
- mT5: A massively multilingual pre-trained text-to-text transformer. arXiv:2010.11934 [cs], March 2021. URL http://arxiv.org/abs/2010.11934. arXiv: 2010.11934.
- When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, ICAIL ’21, pp. 159–168, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450385268. doi: 10.1145/3462757.3466088. URL https://doi.org/10.1145/3462757.3466088.
- Pretrained Domain-Specific Language Model for Natural Language Processing Tasks in the AEC Domain. Comput. Ind., 142(C), November 2022. ISSN 0166-3615. doi: 10.1016/j.compind.2022.103733. URL https://doi.org/10.1016/j.compind.2022.103733. Place: NLD Publisher: Elsevier Science Publishers B. V.
- Jec-qa: A legal-domain question answering dataset. In Proceedings of AAAI, 2020.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.