Komodo: A Linguistic Expedition into Indonesia's Regional Languages (2403.09362v2)

Published 14 Mar 2024 in cs.CL

Abstract: The recent breakthroughs in LLMs have mostly focused on languages with easily available and sufficient resources, such as English. However, there remains a significant gap for languages that lack sufficient linguistic resources in the public domain. Our work introduces Komodo-7B, 7-billion-parameter LLMs designed to address this gap by seamlessly operating across Indonesian, English, and 11 regional languages in Indonesia. Komodo-7B is a family of LLMs that consist of Komodo-7B-Base and Komodo-7B-Instruct. Komodo-7B-Instruct stands out by achieving state-of-the-art performance in various tasks and languages, outperforming the benchmarks set by OpenAI's GPT-3.5, Cohere's Aya-101, Llama-2-Chat-13B, Mixtral-8x7B-Instruct-v0.1, Gemma-7B-it , and many more. This model not only demonstrates superior performance in both language-specific and overall assessments but also highlights its capability to excel in linguistic diversity. Our commitment to advancing LLMs extends beyond well-resourced languages, aiming to bridge the gap for those with limited linguistic assets. Additionally, Komodo-7B-Instruct's better cross-language understanding contributes to addressing educational disparities in Indonesia, offering direct translations from English to 11 regional languages, a significant improvement compared to existing language translation services. Komodo-7B represents a crucial step towards inclusivity and effectiveness in LLMs, providing to the linguistic needs of diverse communities.

References (38)

Authors (4)

Louis Owen (5 papers)
Vishesh Tripathi (4 papers)
Abhay Kumar (28 papers)
Biddwan Ahmed (5 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/vishesh_t27/status/1768483283841519631

https://twitter.com/akanyaani/status/1768491185411109361

YouTube

Show All Videos

Komodo: A Linguistic Expedition into Indonesia's Regional Languages (2403.09362v2)

Summary

Related Papers

Tweets

YouTube