EXAONE 4.0: Unified LLM for AI Research

Updated 22 July 2025

EXAONE 4.0 is a unified large language model that seamlessly integrates rapid instruction following with advanced multi-step reasoning via an innovative post-training pipeline.
It features explicit agentic AI capabilities that enable autonomous tool use and support high-performance multilingual processing across various languages.
Architectural innovations like hybrid attention and QK-Reorder LayerNorm enhance its context handling, efficiency, and benchmark performance for both large-scale and on-device applications.

EXAONE 4.0 is a unified LLM series designed to seamlessly integrate rapid instruction-following and advanced multi-step reasoning within a single architecture. Developed by LG AI Research, EXAONE 4.0 introduces agentic AI capabilities and robust multilingual support while building on and extending the architectural lineage of the EXAONE 3.5 and EXAONE Deep series. The model suite spans two principal configurations—a mid-size 32B model optimized for high performance and a compact 1.2B model targeting on-device applications. EXAONE 4.0 is available for academic and research use under a research-only license (Research et al., 15 Jul 2025).

1. Unified Model Modes and Post-training Integration

EXAONE 4.0 incorporates two operational modes within a single LLM:

Non-reasoning mode targets rapid, high-quality instruction following and “fast thinking,” suitable for typical conversational and everyday tasks.
Reasoning mode is designed for deep, multi-step logical progression, including mathematical problem solving, code generation and execution, and complex deductive tasks.

Rather than training dedicated models for each mode, EXAONE 4.0 achieves this dual functionality through an integrated post-training pipeline. The training process begins with large-scale supervised fine-tuning (SFT) on a composite dataset comprising both non-reasoning (general instruction following from diverse domains) and reasoning-specific data (sourced from math, code, logic, and agentic tool use domains). The dataset composition is carefully balanced at approximately a 1.5:1 token ratio between reasoning and non-reasoning data to prevent “mode bleed” (unintentional dominance of reasoning behaviors in general instruction tasks).

Post-SFT, EXAONE 4.0 undergoes an online reinforcement learning phase using the AGAPO algorithm (Asymmetric Sampling and Global Advantage Policy Optimization). This stage replaces the Proximal Policy Optimization (PPO) clipped objective with a standard policy gradient loss and specifically enables low-probability, highly informative tokens (critical branching points in reasoning chains) to influence model learning. Following RL, a preference learning phase (directly analogous to Direct Preference Optimization) further aligns both modes with human evaluations.

The mathematical formulation for the AGAPO reinforcement learning objective is: $\mathcal{J}_{\mathrm{AGAPO}}(\theta) = \mathbb{E}_{q\sim P(Q),\, \{o_i\}_{i=1}^{G}\sim\pi_{\theta}(O\mid q)} \left[ \frac{1}{G}\sum_{i=1}^{G} \left( A_{\mathrm{global},i}\,\log\pi_{\theta}(o_i\mid q) -\beta\,D_{\mathrm{KL}}(\pi_{\theta}\Vert\pi_{\mathrm{ref}}) \right)\right],$ where the two-stage advantage calculations are: $A_{\mathrm{loo},i} = r_i - \frac{1}{G-1}\sum_{j\neq i} r_j, \quad A_{\mathrm{global},i} = \frac{A_{\mathrm{loo},i} - \operatorname{mean}(\{A_{\mathrm{loo},k}\}_{k})}{\operatorname{std}(\{A_{\mathrm{loo},k}\}_{k})}.$

2. Agentic AI Features

EXAONE 4.0 incorporates explicit agentic AI capabilities through training on datasets that feature multi-turn, interactive tool use. These datasets involve the model planning and invoking tool calls, then reasoning over their outcomes over several conversational turns. This approach prepares EXAONE 4.0 for use cases where autonomous agentic behavior (e.g., scheduling, fetching online data, controlling simulations) is essential, allowing the model to operate not merely as a passive assistant but as an active, task-executing entity. This agentic tool use supports advances required in deploying practical agent-based AI systems and environments (Research et al., 15 Jul 2025).

3. Multilingual Expansion

EXAONE 4.0 expands its multilingual repertoire beyond the English and Korean support of earlier versions to include Spanish, while maintaining robust performance across all supported languages. This is facilitated by the retention of a unified Byte-Level BPE tokenizer and vocabulary for tokenizing multilingual input, along with careful curation of domain-specific data for each language.

The model demonstrates strong results on multilingual evaluation benchmarks, including mathematics and knowledge-intensive tasks in Spanish and Korean. Multilingual performance is a direct design goal, reflected both in the preprocessing pipeline and evaluation methodology.

4. Architectural and Technical Innovations

The architectural design of EXAONE 4.0 inherits core transformer structures from EXAONE 3.5 but introduces several notable advancements:

Hybrid Attention Mechanism: Instead of relying solely on global attention, EXAONE 4.0 employs a fixed ratio of global and local (sliding window) attention layers, extending maximum effective context length to 128K tokens for the 32B model and 64K tokens for the 1.2B model. This methodology improves memory and compute efficiency while preserving long-range dependency modeling.
Normalization Strategy: The model utilizes a QK-Reorder LayerNorm approach, which repositions and modifies normalization to mitigate variance explosion in deeper transformer networks.
Parameterization: The 32B variant is designed for high throughput and accuracy in large-scale deployments, while the 1.2B model targets hardware-constrained or on-device contexts. Both variants share vocabulary and codecs to facilitate downstream cross-model interoperability and model distillation.
Training Pipeline: Supervised fine-tuning is followed by RL via AGAPO and preference learning, all conducted on large-scale compute infrastructure; the 32B model achieves context-efficient operation at scale.

5. Empirical Performance and Benchmarking

EXAONE 4.0 exhibits strong performance across a broad spectrum of reasoning, knowledge, coding, long-context, and agentic tool use tasks. Notable empirical benchmarks include:

World Knowledge: The 32B model achieves a MMLU-Redux score of 92.3, which aligns with or surpasses multiple open-weight and even frontier-class models.
Math and Coding: Scores approximately 85.3 on the AIME 2025 benchmark, outperforming several larger models including Qwen 3 235B and DeepSeek R1-0528.
Long Context Handling: Maintains robust performance when reasoning token budgets are reduced (e.g., a 5–12% drop when moving from 64K to 32K tokens with the 32B model).
Multilingual Tasks: Delivers high accuracy on Spanish MATH500 and Korean knowledge benchmarks, confirming successful cross-language generalization.

Scores reflect the best reported results in Reasoning mode (Research et al., 15 Jul 2025).

Relative to other mid-sized open-weight models, EXAONE 4.0 attains higher scores in several core areas (reasoning, tool use, long-context) and remains competitive even with select models of much greater size.

6. Open Availability, Licensing, and Use

EXAONE 4.0 models are distributed via the Hugging Face Model Hub at https://huggingface.co/LGAI-EXAONE under a research-only license (Research et al., 15 Jul 2025). Key terms of use include:

Restriction to academic, non-commercial purposes, with any commercial exploitation or reverse-engineering explicitly prohibited.
Mandatory adherence to detailed ethical guidelines, including prohibitions on generating harmful or unlawful output.
Dedicated channels for institutions or organizations seeking licensing for commercial use.

This distribution strategy is intended to facilitate rigorous evaluation, downstream development, and responsible experimentation across the academic community.

7. Conceptual Relation to Industry 4.0

The nomenclature and conceptual framing of EXAONE 4.0 echo the principles articulated in "Visões da Indústria 4.0" (Camacho et al., 2021), in which industrial transformation is effected through the convergence of digital and physical systems, cyber-physical production networks, and continuous self-improvement. EXAONE 4.0's focus on agentic AI, multi-modal integration, adaptive reasoning, and system interoperability closely mirrors the paradigm shifts described in the context of Industry 4.0. Both emphasize enabling advanced ecosystems in which intelligent systems interact autonomously and efficiently, with design priorities spanning efficiency, security, sustainability, and continuous workforce and system enhancement.

8. Summary

EXAONE 4.0 stands as a unified LLM series integrating fast-response, non-reasoning operations and deep, step-wise reasoning within a single model. It introduces agentic AI behaviors, extended multilingual capabilities, and substantial advances in attention and normalization mechanisms, all substantiated by state-of-the-art benchmark performance. Positioned within the broader landscape of digital transformation and model interoperability, EXAONE 4.0 offers a flexible and powerful tool for advanced academic, agentic, and multilingual AI research (Research et al., 15 Jul 2025).

PDF Markdown Chat (Upgrade)

References (2)

1.

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes (2025)

2.

Visoes da Industria 4.0 (2021)