EXAONE 3.5: Series of Large Language Models for Real-world Use Cases (2412.04862v2)

Published 6 Dec 2024 in cs.CL

Abstract: This technical report introduces the EXAONE 3.5 instruction-tuned LLMs, developed and released by LG AI Research. The EXAONE 3.5 LLMs are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 LLMs are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: [email protected].

Citations (1)

View on Semantic Scholar

Summary

The paper introduces the EXAONE 3.5 series with models (32B, 7.8B, 2.4B) that excel in long-context reasoning and diverse benchmark tasks.
The models utilize a decoder-only Transformer architecture with dual-stage training to ensure both general domain performance and refined long-context capabilities.
Robust bilingual support and stringent ethical considerations highlight the models’ global applicability and responsible AI deployment.

Insightful Overview of the EXAONE 3.5 Paper

The paper under discussion presents an elaborate account of the development and capabilities of the EXAONE 3.5 series of instruction-tuned LLMs, introduced by LG AI Research. These models are available in three configurations of varying parameter sizes: 32 billion, 7.8 billion, and 2.4 billion, catering to diverse computational and deployment needs across both academic and industrial sectors. Each model is evaluated on several benchmarks that reflect real-world applicability and long-context reasoning, and they demonstrate strong competitive performance in general domain tasks.

Model Architecture and Training

EXAONE 3.5 is built on the decoder-only Transformer architecture, adopting a crucial modification from its predecessor, EXAONE 3.0, with an increased maximum context length of 32,768 tokens achieved through long-context fine-tuning. This enhancement enables the models to efficiently handle larger sequences, which is critical given the surge in applications requiring long-context processing.

The training of these models is carried out in two stages. The initial stage utilizes a broad pre-training corpus targeting enhanced performance across general domains, while the latter stage focuses on refining long-context understanding and other specific domain capabilities. The computational efficiency of this approach is noteworthy; for instance, the 32B model is shown to require significantly less computation compared to some contemporary counterparts while maintaining competitive performance.

Evaluation and Results

Comprehensive evaluation demonstrates that EXAONE 3.5 models excel in real-world use cases and long-context processing. In the real-world use case category, the models achieve the highest scores across benchmarks, outperforming similarly sized models such as Qwen 2.5 and C4AI Command R. Notably, the models also show strong bilingual capabilities in Korean and English, a critical feature for global applicability.

The 2.4B model displays exceptional performance relative to its size, surpassing models with higher parameters, thus addressing an essential need for smaller, efficient models suitable for constrained deployment environments. Moreover, the EXAONE 3.5 series maintains robust performance across general domain benchmarks, which include solving mathematical problems and generating source code. The smaller model even outperforms larger counterparts in some cases, underscoring the effectiveness of the training regimen.

Ethical Considerations and Future Prospects

In line with LG AI Research's commitment to responsible AI, the paper outlines extensive measures undertaken to ensure data compliance, mitigate potential biases, and safeguard the ethical use of the models. Legal, ethical, and data protection aspects are rigorously addressed throughout the model's lifecycle, from data collection to deployment.

The open availability of EXAONE 3.5 for research purposes stands to contribute to ongoing advancements in AI, aligning with increasing calls for transparency and collaborative innovation in the field. The paper indicates optimism for future developments, speculating the potential for evolving AI models that further facilitate complex human-AI interactions while maintaining ethical standards.

In conclusion, the EXAONE 3.5 series signifies a substantial contribution to the field of LLMs, offering scalable solutions tailored for various industrial and academic needs while prioritizing ethical guidelines and computational efficiency.

PDF Markdown

Related Papers

Tweets

https://twitter.com/TheTuringPost/status/1868823450007355743

https://twitter.com/AidfulAI/status/1866242912776003937

YouTube

Show All Videos