Fine Tuning Methods for Low-resource Languages (2510.04139v1)

Published 5 Oct 2025 in cs.CL and cs.LG

Abstract: The rise of LLMs has not been inclusive of all cultures. The models are mostly trained on English texts and culture which makes them underperform in other languages and cultural contexts. By developing a generalizable method for preparing culturally relevant datasets and post-training the Gemma 2 model, this project aimed to increase the performance of Gemma 2 for an underrepresented language and showcase how others can do the same to unlock the power of Generative AI in their country and preserve their cultural heritage.