2000 character limit reached
OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs (2405.07703v5)
Published 13 May 2024 in cs.CL
Abstract: In recent years, LLMs have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specialized for Romanian.