Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models (2403.02715v2)

Published 5 Mar 2024 in cs.CL and cs.AI

Abstract: Recent advancements in LLMs have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 common tasks and 31 metrics. Our evaluation results reveal that the fine-tuned LLMs exhibit enhanced comprehension and generative capabilities in Vietnamese. Moreover, our analysis indicates that models with more parameters can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or fine-tuning datasets. These insights underscore the significance of meticulous fine-tuning with high-quality datasets in enhancing LLM performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Giambattista Amati. 2009. BM25, pages 257–260. Springer US, Boston, MA.
  2. The impact of large language modeling on natural language processing in legal texts: A comprehensive survey. In 2023 15th International Conference on Knowledge and Systems Engineering (KSE), pages 1–7.
  3. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online. Association for Computational Linguistics.
  4. Quoc Vuong Binh. 2021. Binhvq News Corpus.
  5. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5:135–146.
  6. mmarco: A multilingual version of the ms marco passage ranking dataset.
  7. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20(3):273–297.
  8. D. R. Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), 20(2):215–242.
  9. Intent Detection and Slot Filling for Vietnamese. In Proc. Interspeech 2021, pages 4698–4702.
  10. Xuan-Quy Dao and Ngoc-Bich Le. 2023. Llms performance on vietnamese high school biology examination. International Journal of Modern Education and Computer Science (IJMECS), 15(6):14–30.
  11. 8-bit optimizers via block-wise quantization. In International Conference on Learning Representations.
  12. QLoRA: Efficient finetuning of quantized LLMs. In Thirty-seventh Conference on Neural Information Processing Systems.
  13. Vsec: Transformer-based model for vietnamese spelling correction. In PRICAI 2021: Trends in Artificial Intelligence, pages 259–272, Cham. Springer International Publishing.
  14. PhoMT: A high-quality and large-scale benchmark dataset for Vietnamese-English machine translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4495–4503, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  15. Bradley Efron and Robert Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton, FL.
  16. Wikimedia Foundation. 2022. Wikimedia downloads.
  17. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 708–719, New Orleans, Louisiana. Association for Computational Linguistics.
  18. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 1321–1330. JMLR.org.
  19. Language-independent model for machine translation evaluation with reinforced factors. In Machine Translation Summit XIV, pages 215–222. International Association for Machine Translation.
  20. Measuring mathematical problem solving with the MATH dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  21. Emotion recognition for vietnamese social media text. In Computational Linguistics, pages 319–333, Singapore. Springer Singapore.
  22. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  23. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  24. mrobust04: A multilingual version of the trec robust 2004 benchmark.
  25. WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4034–4048, Online. Association for Computational Linguistics.
  26. Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning.
  27. Bloom: A 176b-parameter open-access multilingual language model.
  28. MLQA: Evaluating Cross-lingual Extractive Question Answering. In Proceedings of ACL 2020.
  29. Holistic evaluation of language models. Transactions on Machine Learning Research. Featured Certification, Expert Certification.
  30. Andy Liaw and Matthew Wiener. 2002. Classification and regression by randomforest. R News, 2(3):18–22.
  31. Hoang Le Long. 2023. Vietnamese Fullname Generator. https://github.com/lhlong/vietnamese-fullname-generator. Accessed: Dec 14, 2023.
  32. A large-scale dataset for hate speech detection on vietnamese social media texts. In Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices, pages 415–426, Cham. Springer International Publishing.
  33. Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
  34. A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4079–4085.
  35. PhoGPT: Generative Pre-training for Vietnamese. arXiv preprint, arXiv:2311.02945.
  36. Hate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTM-CNN Model. In The Sixth International Workshop on Vietnamese Language and Speech Processing.
  37. Enhancing logical reasoning in large language models to facilitate legal applications.
  38. Vlsp shared task: Sentiment analysis. Journal of Computer Science and Cybernetics, 34(4):295–310.
  39. Uit-vsfc: Vietnamese students’ feedback corpus for sentiment analysis. In Proceedings of 10th International Conference on Knowledge and Systems Engineering (KSE), pages 19–24.
  40. Enhancing lexical-based approach with external knowledge for vietnamese multiple-choice machine reading comprehension. IEEE Access, 8:201404–201417.
  41. Constructive and toxic speech detection for open-domain social media comments in vietnamese. In Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, July 26–29, 2021, Proceedings, Part I, page 572–583, Berlin, Heidelberg. Springer-Verlag.
  42. Ms marco: A human generated machine reading comprehension dataset.
  43. Vnds: A vietnamese dataset for summarization. In 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), pages 375–380.
  44. OpenAI. 2023. ChatGPT. https://chat.openai.com. Large language model.
  45. OpenAI. 2023. Introducing gpt-4. https://openai.com/research/gpt-4.
  46. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics.
  47. ViT5: Pretrained text-to-text transformer for Vietnamese language generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 136–142, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
  48. P. J. Price. 1990. Evaluation of spoken language systems: the ATIS domain. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990.
  49. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  50. MosaicML NLP Team. 2023. Introducing mpt-7b: A new standard for open-source, commercially usable llms. Accessed: 2023-05-05.
  51. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  52. Hierarchical transformer encoders for vietnamese spelling correction. In Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices, pages 547–556, Cham. Springer International Publishing.
  53. ViLM. 2023. Vietcuna-7b-v3. https://huggingface.co/vilm/vietcuna-7b-v3. Accessed: 2023-09-09.
  54. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models.
  55. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
  56. wikiHow. 2023. wikihow experts. https://www.wikihow.com/Experts. Accessed: Dec 14, 2023.
  57. Lime: Learning inductive bias for primitives of mathematical reasoning. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 11251–11262. PMLR.
  58. Evaluation of chatgpt and microsoft bing ai chat performances on physics exams of vietnamese national high school graduation examination.
  59. Assessing hidden risks of llms: An empirical study on robustness, consistency, and credibility.
  60. Zalo AI. 2023. Zalo ai challenge: End to end question answering task. Accessed: 2023-12-09.
  61. Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.
  62. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sang T. Truong (12 papers)
  2. Duc Q. Nguyen (2 papers)
  3. Toan Nguyen (32 papers)
  4. Dong D. Le (1 paper)
  5. Nhi N. Truong (1 paper)
  6. Sanmi Koyejo (111 papers)
  7. Tho quan (14 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com