Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese (2010.01891v1)

Published 5 Oct 2020 in cs.CL and cs.AI

Abstract: Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key configurations and find that: automatic Vietnamese word segmentation improves the parsing results of both baselines; the normalized pointwise mutual information (NPMI) score (Bouma, 2009) is useful for schema linking; latent syntactic features extracted from a neural dependency parser for Vietnamese also improve the results; and the monolingual LLM PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) helps produce higher performances than the recent best multilingual LLM XLM-R (Conneau et al., 2020).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Anh Tuan Nguyen (17 papers)
  2. Mai Hoang Dao (5 papers)
  3. Dat Quoc Nguyen (55 papers)
Citations (51)

Summary

We haven't generated a summary for this paper yet.