EXAONE 3.0 7.8B Instruction Tuned Language Model (2408.03541v2)

Published 7 Aug 2024 in cs.CL and cs.AI

Abstract: We introduce EXAONE 3.0 instruction-tuned LLM, the first open model in the family of LLMs developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly competitive real-world performance with instruction-following capability against other state-of-the-art open models of similar size. Our comparative analysis shows that EXAONE 3.0 excels particularly in Korean, while achieving compelling performance across general tasks and complex reasoning. With its strong real-world effectiveness and bilingual proficiency, we hope that EXAONE keeps contributing to advancements in Expert AI. Our EXAONE 3.0 instruction-tuned model is available at https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

Citations (3)

View on Semantic Scholar

Summary

The paper presents an open-access, instruction-tuned language model aimed at democratizing expert AI research and innovation.
The model employs a unique bilingual tokenization approach with a decoder-only transformer architecture and extensive pre-training on trillion-token corpora.
Evaluated across diverse benchmarks, EXAONE 3.0 7.8B demonstrates robust performance in reasoning, coding, and mathematics in both English and Korean.

Overview of EXAONE 3.0 7.8B Instruction-Tuned LLM

Introduction

The paper, titled "EXAONE 3.0 7.8B Instruction Tuned LLM" by LG AI Research, presents an open-access, instruction-tuned LLM. This is the first publicly released model in their EXAONE series, designed to democratize advanced AI capabilities for widespread research and innovation. The model aims to provide expert-level AI support for both general users and professionals, with a specific emphasis on bilingual proficiency in English and Korean. The release is designated for non-commercial, research purposes and is accessible via the Hugging Face Model Hub.

Model Training

The training of the EXAONE 3.0 7.8B model encompasses several stages, including pre-training, efficient tokenization, and advanced post-training techniques. The base architecture follows a decoder-only transformer with 32 layers and has been optimized for a maximum context length of 4,096 tokens using Rotary Position Embeddings (RoPE) and Grouped Query Attention (GQA).

Tokenizer

A differentiating aspect of EXAONE 3.0 is its bilingual tokenization strategy, utilizing a byte-level byte-pair encoding (BBPE) tokenizer tailored specifically for English and Korean. This consideration of the linguistic features led to a lower compression ratio for Korean, optimizing the model's performance and mitigating over-tokenization issues.

Pre-training and Data Regime

The pre-training phase involved a comprehensive collection of web-crawled, public, and proprietary corpora, filtered rigorously to meet high data quality and compliance standards. The training was conducted in two rounds: an initial general domain-focused training on six trillion tokens, followed by an additional two trillion tokens emphasizing expert domain knowledge.

Post-training

Post-training refinement aimed at enhancing instruction-following capabilities was conducted through two stages: supervised fine-tuning and Direct Preference Optimization (DPO). This ensured the model's ability to handle diverse and complex tasks, aligning with user preferences and practical application scenarios.

Evaluation

The model's evaluation spanned multiple benchmarks, covering real-world use cases, reasoning, coding, mathematics, and general language capabilities.

English Performance

Real-world use cases: The model demonstrated superior performance in scenarios emulated by benchmarks like MT-Bench and AlpacaEval 2.0, showing robustness in instruction-following through high scores.
Reasoning: Competitively, the model scored well on ARC-C and GPQA benchmarks, indicating strong logical and scientific reasoning abilities.
Coding: With high scores on HumanEval and MBPP, EXAONE 3.0 7.8B showcases proficient coding skill, particularly in Python.
Mathematics: The model's performance in GSM8K and MATH benchmarks was outstanding, reflecting advanced mathematical problem-solving capabilities.
General capabilities: The model also showed competitive performance in benchmarks like IFEval, BBH, and MMLU-Pro, underscoring its general proficiency.

Korean Performance

Real-world use cases: The model excelled in benchmarks such as KoMT-Bench and LogicKor, with high proficiency in instruction-following tasks in Korean.
General performance: Achieving top scores on KMMLU and KoBEST, it demonstrated strong capabilities across various tasks and domains in the Korean language.

Responsible AI and Limitations

LG AI Research emphasizes responsible AI development aligned with ethical principles. The model aims to mitigate risks such as harmful content generation, socio-cultural biases, and misuse through comprehensive testing and compliance measures. Nonetheless, inherent limitations include potential generation of incorrect or biased responses and the absence of the latest information due to the static nature of the training dataset.

Future Implications

The release of EXAONE 3.0 7.8B is set to foster a collaborative research environment, accelerating advancements across various AI-driven applications. Its robust bilingual capabilities particularly hold promise for domains requiring nuanced understanding across English and Korean, setting a foundation for further model releases and broader access in the future.

In conclusion, EXAONE 3.0 7.8B serves as a significant contribution to the AI research community, balancing high performance with responsible deployment. The model's open accessibility paves the way for wide-ranging innovations, reinforcing LG AI Research’s commitment to democratizing expert AI.