BLT: Can Large Language Models Handle Basic Legal Text? (2311.09693v3)

Published 16 Nov 2023 in cs.CL and cs.AI

Abstract: We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs' poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs' reliability as-is for basic legal tasks.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (36)

Authors (3)

Andrew Blair-Stanek (8 papers)
Nils Holzenberger (15 papers)
Benjamin Van Durme (173 papers)

Citations (3)

View on Semantic Scholar

Tweets

https://twitter.com/jhuclsp/status/1855031746364162341

BLT: Can Large Language Models Handle Basic Legal Text? (2311.09693v3)

Related Papers

Tweets