Olmo 3: Open-Source Transformer Models
- Olmo 3 is a family of fully open-source decoder-only transformer models at 7B and 32B scales, designed for long-context reasoning, function calling, coding, and instruction following.
- The architecture uses Sliding-Window Attention (SWA) and Grouped-Query Attention (GQA) to efficiently handle context lengths up to 65K tokens with rotary position embeddings.
- Its transparent training pipeline publishes all stages, data compositions, and safety evaluations, ensuring reproducibility and rigorous validation in open LLM ecosystems.
Olmo 3 is a family of fully open-source decoder-only transformer-based LLMs released at two parameter scales: 7 billion (7B) and 32 billion (32B). Designed specifically for advanced long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall, Olmo 3 establishes a transparent model flow by publishing every step, checkpoint, and dependency in its lifecycle. Its flagship, Olmo 3 Think 32B, is recognized as the strongest fully-open thinking model released to date (OLMo et al., 15 Dec 2025).
1. Model Architecture
Olmo 3 employs a dense transformer backbone, modified to facilitate scalable long-context processing and efficient attention mechanisms. Both model sizes implement Sliding-Window Attention (SWA): in three out of every four layers, attention is limited to a local window of 4096 tokens, with unrestricted full attention in the final layer. This reduces per-layer attention complexity from to , where is the sequence length (up to 65,536 after extension) and is the window size (4096).
The 32B variant enhances speed with Grouped-Query Attention (GQA), clustering attention heads in groups of size 5. Both models utilize rotary position embeddings (RoPE, ), RMSNorm for normalization, and 8K token context windows during pretraining. Post-training, context length is extended to 65K tokens via YaRN RoPE scaling, restricted to full-attention layers.
Architecture Specifications
| Model | Layers | Hidden Size | Attention Heads | Attention Modifications |
|---|---|---|---|---|
| 7B | 32 | 4096 | 32 | SWA |
| 32B | 64 | 5120 | 40 (GQA) | SWA, GQA |
2. Training Pipeline and Data Composition
Olmo 3's training pipeline is fully open, with all code, data mixes, checkpoints, and configurations published. Training progresses through three stages:
- Stage 1: Pretraining on Dolma 3 Mix (6T tokens), sourced from a 9T token pool of web text, academic PDFs (238M via olmOCR), GitHub code, FineMath, arXiv LaTeX, and Wikipedia/Wikibooks. Three-pass trillion-scale deduplication (exact hash, MinHash clustering, fuzzy suffix arrays) reduces documents by 75%. Data are classified into 480 topic-quality buckets using FastText descents of WebOrganizer, and optimized via swarm-based mixture optimization (Olmix). Quality-aware upsampling biases high-quality web data.
- Stage 2: Midtraining on Dolma 3 Dolmino Mix (100B tokens), distilled from a 2T pool, focuses on code, math, general QA, instruction, and science PDFs. A two-part process employs micro-anneals (proxy models for quick signal) and integration tests (100B integration midtrains). Synthetic sources include TinyMATH, CraneCode, Reddit-to-Flashcards, TinyMATH-style meta-reasoning, Tulu 3, and Flan. Model soup merging raises 32B midtrain scores.
- Stage 3: Long-Context Extension leverages Dolma 3 Longmino Mix (50B for 7B, 100B for 32B), with a 639B token pool of >8K token PDFs (filtered for compressibility) and synthetic aggregation tasks (CWE, REX). Tokens are mixed (34% long, 66% short) and trained with YaRN on full-attention layers, document packing, and intra-document masking to extend context up to 65K.
All resources are released openly through public repositories.
3. Design Objectives and Advanced Capabilities
Olmo 3 excels in long-context inference, function calling, coding (including fill-in-the-middle tasks), instruction following, chat, and knowledge recall. Design advances include:
- Structured Thinking Traces: Supervised finetuning (Dolci Think SFT) incorporates meta-reasoning—self-awareness, backward-chaining, verification, strategy selection, and conceptual reasoning.
- Delta Learning via Direct Preference Optimization (DPO): Dolci Think applies a delta objective pairing responses from strong and weak models to maximize capability deltas. The DPO objective follows:
- RL with Verifiable Rewards (RLVR): Built on GRPO and DAPO, Olmo 3 Think uses off-policy sampling, token-level PPO loss with advantage clipping and truncated importance sampling, eschewing KL penalty. The RL objective is:
where is a token-wise ratio of current to prior policy, and is group-wise advantage. Rewards are diverse and verifiable: full-correct answers in math (SymPy), pass@k for code, adherence to IF constraints, and quality judged by LM.
4. Evaluation Protocols and Performance Metrics
Olmo 3 is assessed through base and post-training evaluations across diverse benchmarks.
Base Model Performance
| Benchmark | Olmo 3 7B | Marin/Apertus (7–8B) | Olmo 3 32B | Marin/Apertus (32–70B) |
|---|---|---|---|---|
| Math | 54.7 | 39.6 (Marin 8B) | 69.7 | 49.3/39.7 |
| Code | 30.7 | 21.4 | 39.7 | 30.8/23.3 |
| STEM MCQA | 66.4 | 68.1 | 75.6 | 75.9/70.0 |
| GenQA | 72.5 | 71.6 | 79.4 | 80.3/75.0 |
Long-Context Performance
RULER dev scores (needle-in-haystack, aggregation):
| Model | 4K | 8K | 16K | 32K | 65K |
|---|---|---|---|---|---|
| 7B | 94.9 | 91.2 | 84.1 | 78.8 | 67.9 |
| 32B | 96.1 | 94.6 | 90.4 | 86.2 | 79.7 |
HELMET held-out: 7B up to 36.8, 32B up to 52.11.
Post-training Models
Olmo 3 Think 32B achieves:
- MATH: 96.2 (vs Marin 32B 36.8, Apertus 70B 36.2)
- AIME 24: 80.6
- BigBenchHard: 88.6
- HumanEval+: 91.5
- CodexPass: 91.5
- IFEval: 93.8
- MMLU: 86.4
- GPQA: 57.5
Extended RL (Think 3.1) improves math and instruction scores further.
5. Safety and Robustness Evaluations
Olmo 3 is validated on 12 safety tasks, including HarmBench, DAN, WildGuard, WildJailbreak, XSTest, TrustLLM, Toxigen, StrongReject, WMDP, and BBQ. Think/Instruct variants outperform prior open models on refusal accuracy, with Olmo 3 Think 32B achieving up to 100% on Toxigen and >90% on HarmBench, DAN, WildGuard, etc.
6. Openness, Release Practices, and Resources
Olmo 3’s fully-open model flow distinguishes it in the open-source LLM ecosystem. All code for pretraining, midtraining, LC extension, supervised finetuning, DPO, RLVR, data recipes, deduplication, and evaluation (OLMES, decon) is published on GitHub. Data mixes and pools are available on Hugging Face, and checkpoints are distributed via Hugging Face and Weights & Biases.
All stages, configurations, and model artifacts are fully documented and accessible, enabling comprehensive reproducibility and further investigation within the research community (OLMo et al., 15 Dec 2025).