GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning (2504.16832v1)

Published 23 Apr 2025 in cs.CL

Abstract: Chain-of-Thought (CoT) is a robust approach for tackling LLM tasks that require intermediate reasoning steps prior to generating a final answer. In this paper, we present GreenMind-Medium-14B-R1, the Vietnamese reasoning model inspired by the finetuning strategy based on Group Relative Policy Optimization. We also leverage a high-quality Vietnamese synthesized reasoning dataset and design two reward functions to tackle the main limitations of this technique: (i) language mixing, where we explicitly detect the presence of biased language characters during the process of sampling tokens, and (ii) we leverage Sentence Transformer-based models to ensure that the generated reasoning content maintains factual correctness and does not distort the final output. Experimental results on the Vietnamese dataset from the VLSP 2023 Challenge demonstrate that our model outperforms prior works and enhances linguistic consistency in its responses. Furthermore, we extend our evaluation to SeaExam-a multilingual multiple-choice dataset, showing the effectiveness of our reasoning method compared to few-shot prompting techniques.

Summary

GreenMind: A Vietnamese LLM for Structured Reasoning

The paper introduces GreenMind-Medium-14B-R1 as an enhancement in Vietnamese language processing, with a focus on structured reasoning capabilities. The model is finely tuned using Group Relative Policy Optimization (GRPO) and leverages a Vietnamese reasoning dataset to demonstrate superior performance compared to few-shot prompted techniques and existing Vietnamese LLMs.

Central to the paper is the adoption of Chain-of-Thought (CoT) prompting for generating intermediate reasoning steps. CoT has been historically significant in improving LLM performance on tasks requiring logical and multi-step reasoning. GreenMind incorporates CoT and further developments in reinforcement learning, such as Proximal Policy Optimization (PPO), to enhance these capabilities.

The paper showcases a large Vietnamese reasoning dataset curated by the authors, focusing on various domains including mathematics, cultural knowledge, legal insight, and educational exams. This carefully constructed dataset serves to promote task diversity, linguistic complexity, reasoning depth, and answer verifiability, thereby fostering a more comprehensive reasoning model.

The implementation of GreenMind employs a series of reward functions to rectify common issues in multilingual models, such as language bias and content distortion. Two specific reward mechanisms, language rewards and semantic similarity rewards, aid the model in generating contextually accurate Vietnamese responses while ensuring coherence with the established reasoning chain and the final answer.

Experimental results reinforce the model's effectiveness across established benchmarks such as VLSP 2023 and SeaExam, where GreenMind surpasses performances of models several magnitudes larger in parameter size. These results affirm the approach of targeted, high-quality data curation combined with strategic optimization for reasoning tasks.

The implications of this research extend significantly to practical applications in AI-driven interfaces where accurate, culture-sensitive interaction with Vietnamese is pivotal. As such, GreenMind represents an advancement in the NLP domain, setting a precedent for constructing robust language processing systems centered on logical reasoning and question-answering tasks.

This research potentially impacts theoretical progress in AI by highlighting the importance of tailored datasets and strategic fine-tuning methodologies. Future directions could explore extending the model's reasoning capabilities across other languages or integrating additional data augmentation methods to improve model robustness further. Additionally, the deployment of GreenMind in interactive student-teacher interfaces or complex administrative systems could leverage its explanatory outputs and high factual accuracy.

GreenMind-Medium-14B-R1 showcases the potential for specialized, regional LLMs to offer precise, logical reasoning outputs, addressing both educational and applied settings within the Vietnamese linguistic landscape.

YouTube

Show All Videos

GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning (2504.16832v1)

Summary

GreenMind: A Vietnamese LLM for Structured Reasoning

Related Papers

YouTube