Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Customizing Contextualized Language Models forLegal Document Reviews (2102.05757v1)

Published 10 Feb 2021 in cs.CL, cs.AI, and cs.LG

Abstract: Inspired by the inductive transfer learning on computer vision, many efforts have been made to train contextualized LLMs that boost the performance of natural language processing tasks. These models are mostly trained on large general-domain corpora such as news, books, or Wikipedia.Although these pre-trained generic LLMs well perceive the semantic and syntactic essence of a language structure, exploiting them in a real-world domain-specific scenario still needs some practical considerations to be taken into account such as token distribution shifts, inference time, memory, and their simultaneous proficiency in multiple tasks. In this paper, we focus on the legal domain and present how different LLM strained on general-domain corpora can be best customized for multiple legal document reviewing tasks. We compare their efficiencies with respect to task performances and present practical considerations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shohreh Shaghaghian (5 papers)
  2. Luna (2 papers)
  3. Feng (5 papers)
  4. Borna Jafarpour (2 papers)
  5. Nicolai Pogrebnyakov (4 papers)
Citations (18)