Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Vietnamese Legal Question--Answering System based on Automatic Data Enrichment (2306.04841v1)

Published 8 Jun 2023 in cs.CL

Abstract: Question answering (QA) in law is a challenging problem because legal documents are much more complicated than normal texts in terms of terminology, structure, and temporal and logical relationships. It is even more difficult to perform legal QA for low-resource languages like Vietnamese where labeled data are rare and pre-trained LLMs are still limited. In this paper, we try to overcome these limitations by implementing a Vietnamese article-level retrieval-based legal QA system and introduce a novel method to improve the performance of LLMs by improving data quality through weak labeling. Our hypothesis is that in contexts where labeled data are limited, efficient data enrichment can help increase overall performance. Our experiments are designed to test multiple aspects, which demonstrate the effectiveness of the proposed technique.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Thi-Hai-Yen Vuong (13 papers)
  2. Ha-Thanh Nguyen (33 papers)
  3. Quang-Huy Nguyen (13 papers)
  4. Le-Minh Nguyen (23 papers)
  5. Xuan-Hieu Phan (3 papers)
Citations (2)