EXAONE Deep - Reasoning-Optimized LLMs

Updated 22 July 2025

EXAONE Deep is a family of reasoning-optimized large language models that use explicit chain-of-thought templating to structure multi-step reasoning.
It employs a three-stage training methodology including supervised fine-tuning, direct preference optimization, and online reinforcement learning to enhance precision.
The models set new benchmarks in mathematics, coding, and logical reasoning, and are openly accessible for research under a dedicated license.

EXAONE Deep denotes a family of reasoning-optimized LLMs developed by LG AI Research, designed to exhibit superior capabilities particularly in domains requiring advanced mathematical, logical, and coding reasoning. The EXAONE Deep models are openly available for research and occupy a central role in the evolution of the EXAONE series, having established new benchmarks for reasoning with carefully engineered architectures, reasoning-specialized datasets, and advanced training methodologies (Research et al., 16 Mar 2025, Research et al., 7 Aug 2024, Research et al., 6 Dec 2024, Research et al., 15 Jul 2025).

1. Architectural Design and Reasoning Principle

The EXAONE Deep series comprises three primary model sizes—2.4B, 7.8B, and 32B parameters—derived as reasoning-specialized fine-tuned variants of the EXAONE 3.5 Instruct models. At the foundation is a decoder-only Transformer, utilizing contemporary architectural components including:

SwiGLU activation in the feedforward layers,
Generalized Query Attention (GQA) for efficient scaled attention,
Rotary Position Embeddings (RoPE) for context ordering,
Pre-normalization with LayerNorm for optimization stability,
Support for extended contexts (up to 32K tokens in EXAONE 3.5, extended further in EXAONE 4.0).

A distinctive attribute is the explicit incorporation of chain-of-thought (CoT) reasoning. Training data are formatted such that each prompt generates a structured (<thought>…</thought>) stream, followed by an explicit, boxed answer, e.g.

\begin{align*}
\text{\textless{}thought\textgreater{}}
&\text{Step 1: Compute...} \
&\text{Step 2: Substitute...} \
\text{\textless{}/thought\textgreater{}}
\
\boxed{4}
\end{align*}

This format directs the models to organize, verify, and correct multi-step reasoning within a single response, fostering both reflexivity and precision (Research et al., 16 Mar 2025).

2. Training Methodology and Reasoning Data

The training pipeline for EXAONE Deep employs a three-stage methodology:

Supervised Fine-Tuning (SFT): Approximately 1.6 million thought-annotated examples spanning ~12 billion tokens, selected and templated to trigger explicit reasoning.
Direct Preference Optimization (DPO): 20,000 instances of preference-aligned data enforce correct reasoning chains by maximizing the probability gap between preferred and rejected responses, using fine-grained signal from SimPER-type objectives.
Online Reinforcement Learning (Online RL): An additional 10,000 parallel instances train the model to further self-correct, using a customized Generalized Reward Policy Optimization (GRPO) variant.

The data construction is notable for emphasizing mathematical derivation, code tracing, scientific explanations, and multi-hop factual reasoning, presented in a manner designed to encourage “think aloud” behavior. This approach inculcates procedural knowledge as well as simple recall or pattern matching (Research et al., 16 Mar 2025).

3. Evaluation Methodology and Benchmark Results

EXAONE Deep models are evaluated across a variety of rigorous public and in-house benchmarks:

Mathematics: MATH-500, AIME (American Invitational Mathematics Examination, 2024/2025), South Korea's CSAT 2025 math section.
Coding: LiveCodeBench and standard code generation evaluation sets.
General Reasoning: GPQA Diamond, MMLU, and MMLU-Pro.

Performance metrics include:

Pass@1: For coding, computes accuracy as the mean correctness over k generations:

$\text{pass@1} = \frac{1}{k}\sum_{i=1}^k p_i$

where $p_i$ indicates correctness of the $i$ th output.

Consistency cons@k: For mathematics, determines answer reliability by majority (consensus) voting across generations.
Benchmark accuracy: For knowledge and reasoning tasks, standard question-answering metrics.

EXAONE Deep 2.4B and 7.8B models consistently outperform open-weight peers of similar size, such as DeepSeek-R1-Distill-Qwen-1.5B/7B and OpenAI o1-mini, while The EXAONE Deep 32B achieves results on par with or surpassing QwQ-32B and DeepSeek-R1, and outperforms distilled models like DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Llama-70B (Research et al., 16 Mar 2025).

4. Open Research Access and Licensing

All EXAONE Deep models are released under a research-use license and are available at https://huggingface.co/LGAI-EXAONE. The license permits academic research, publication, and derivative work with attribution, but restricts commercial use. The release is intended to foster reproducible research and facilitate rigorous exploration of systematic reasoning in LLMs (Research et al., 16 Mar 2025, Research et al., 6 Dec 2024, Research et al., 15 Jul 2025).

5. Influence on Subsequent Model Development

The methodological innovations and empirically validated reasoning performance of EXAONE Deep have directly influenced subsequent iterations in the EXAONE model line:

EXAONE 4.0 integrates the advanced reasoning abilities of EXAONE Deep into a unified model with both “Non-reasoning” (fast response) and “Reasoning” modes, leveraging a hybrid attention mechanism and an improved QK-Reorder-LN normalization strategy.
An AGAPO (Asymmetric Sampling and Global Advantage Policy Optimization) reinforcement learning algorithm is adopted for post-training, replacing the clipped PPO loss with global advantage normalization, formulated as:

$\mathcal{J}_{\text{AGAPO}}(\theta) = \mathbb{E}_{q,\{o_i\}} \left[ \frac{1}{G} \sum_{i=1}^G \left( A_{\text{global},i}\cdot\log\pi_\theta(o_i|q) - \beta \cdot D_{\text{KL}}(\pi_\theta, \pi_{\text{ref}})\right)\right]$

This approach increases the model’s focus on rare, reflective reasoning steps (Research et al., 15 Jul 2025).

EXAONE 4.0 also extends agentic tool use, allowing the model to conduct external API calls and iteratively refine its reasoning within dialogues, thus combining EXAONE Deep’s reasoning tradition with practical agentic workflows.

6. Multilingual Reasoning and Tokenization

The EXAONE Deep series inherits from previous EXAONE models a carefully engineered tokenization pipeline for Korean and English, using a MeCab pre-processing stage prior to Byte-BPE (BBPE) training. This reduces token count per Korean word and improves fluency, compression ratio, and performance on agglutinative language tasks. Later versions (EXAONE 4.0) expand this support to Spanish, using the same vocabulary, and fine-tune with language-specific reasoning data. The chain-of-thought methodology, being format-agnostic, is consistently applied across all three languages, ensuring that mathematical and coding reasoning transfers robustly to multilingual scenarios (Research et al., 7 Aug 2024, Research et al., 6 Dec 2024, Research et al., 15 Jul 2025).

7. Significance, Limitations, and Future Research

The introduction of EXAONE Deep marked a significant inflection in reasoning-centric LLM research, establishing that a combination of explicit chain-of-thought templating, structured reasoning data, and multi-stage preference and reinforcement training can yield measurable improvements in complex domains without major scale increases. Open availability, comprehensive licensing, and detailed evaluation protocols have encouraged rapid adoption and benchmarking by the research community.

Despite these advances, limitations persist:

Models exhibit performance degradation on highly compositional or nested logical tasks not explicitly encountered during fine-tuning.
Contextual robustness across extended document lengths and reasoning budgets (token-limited CoT generations) remains an area of experimentation, especially in smaller parameter configurations (Research et al., 15 Jul 2025).
As highlighted in related EXAONE studies, fact-consistency and semantic accuracy issues appear in tasks such as text-to-SQL and business intelligence, suggesting that reasoning specialization alone does not resolve limitations in symbolic logic or data grounding (Choi, 30 Apr 2025).

Ongoing research focuses on hybrid architectures, improved fact-consistency layers, and tighter integration with symbolic engines or verifiable external tools.

Table: Summary of EXAONE Deep Series

Model Size	Training Specialization	Benchmark Highlights
2.4B	CoT Reasoning, Math/Coding	Outperforms DeepSeek-R1-1.5B
7.8B	Reasoning, STEM, Bilingual	Superior to Qwen-7B, o1-mini
32B	Advanced Reasoning, LongCtx	Competitive with QwQ-32B, DeepSeek-R1

EXAONE Deep represents a lineage of LLMs systematically optimized for reasoning. Its methodologies and empirical results have informed both the practical deployment of specialized reasoning models and the architectural evolution of generalist, agentic, and multilingual models within the broader EXAONE framework.