Improving Legal Judgement Prediction in Romanian with Long Text Encoders (2402.19170v2)
Abstract: In recent years,the entire field of NLP has enjoyed amazing novel results achieving almost human-like performance on a variety of tasks. Legal NLP domain has also been part of this process, as it has seen an impressive growth. However, general-purpose models are not readily applicable for legal domain. Due to the nature of the domain (e.g. specialized vocabulary, long documents) specific models and methods are often needed for Legal NLP. In this work we investigate both specialized and general models for predicting the final ruling of a legal case, task known as Legal Judgment Prediction (LJP). We particularly focus on methods to extend to sequence length of Transformer-based models to better understand the long documents present in legal corpora. Extensive experiments on 4 LJP datasets in Romanian, originating from 2 sources with significantly different sizes and document lengths, show that specialized models and handling long texts are critical for a good performance.
- Chemberta-2: Towards chemical foundation models. arXiv preprint arXiv:2209.01712.
- Etc: Encoding long and structured inputs in transformers. arXiv preprint arXiv:2004.08483.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Neural legal judgment prediction in english. arXiv preprint arXiv:1906.02059.
- An empirical study on large-scale multi-label text classification including few and zero-shot labels. arXiv preprint arXiv:2010.01653.
- Legal-bert: The muppets straight out of law school. arXiv preprint arXiv:2010.02559.
- Large-scale multi-label text classification on eu legislation. arXiv preprint arXiv:1906.02192.
- Paragraph-level rationale extraction through regularization: A case study on european court of human rights cases. arXiv preprint arXiv:2103.13084.
- Lexglue: A benchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976.
- Joint entity and relation extraction for legal documents with legal feature enhancement. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1561–1571.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509.
- Chemberta: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885.
- Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092.
- Evaluating factuality in text simplification. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2022, page 7331. NIH Public Access.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Cogview2: Faster and better text-to-image generation via hierarchical transformers. Advances in Neural Information Processing Systems, 35:16890–16902.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
- Long video generation with time-agnostic vqgan and time-sensitive transformer. In European Conference on Computer Vision, pages 102–118. Springer.
- Cuad: An expert-annotated nlp dataset for legal contract review. arXiv preprint arXiv:2103.06268.
- Kpi-bert: A joint named entity recognition and relation extraction model for financial reports. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 606–612. IEEE.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Dependency learning for legal judgment prediction with a unified text-to-text transformer. arXiv preprint arXiv:2112.06370.
- Efficient long-text understanding with short-text models. Transactions of the Association for Computational Linguistics, 11:284–299.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451.
- Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback. arXiv preprint arXiv:2307.16039.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- Med-bert: A pretraining framework for medical records named entity recognition. IEEE Transactions on Industrial Informatics, 18(8):5600–5608.
- Text classification using string kernels. Journal of machine learning research, 2(Feb):419–444.
- jurbert: A romanian bert model for legal judgement prediction. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 86–94.
- Joel Niklaus and Daniele Giofré. 2022. Budgetlongformer: Can we cheaply pretrain a sota legal language model from scratch? arXiv preprint arXiv:2211.17135.
- Lextreme: A multi-lingual and multi-task benchmark for the legal domain. arXiv preprint arXiv:2301.13126.
- An empirical study on cross-x transfer for legal judgment prediction. arXiv preprint arXiv:2209.12325.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Keeping your eye on the ball: Trajectory attention in video transformers. Advances in neural information processing systems, 34:12493–12506.
- Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR.
- Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1):86.
- Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68.
- Bert-pli: Modeling paragraph-level interactions for legal case retrieval. In IJCAI, pages 3501–3507.
- Mixup-transformer: dynamic data augmentation for nlp tasks. arXiv preprint arXiv:2010.02394.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Attention is all you need. Advances in neural information processing systems, 30.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- Fingpt: Open-source financial large language models. arXiv preprint arXiv:2306.06031.
- Finbert: A pretrained language model for financial communications. arXiv preprint arXiv:2006.08097.
- Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297.
- Mihai Masala (6 papers)
- Traian Rebedea (23 papers)
- Horia Velicu (3 papers)