Papers
Topics
Authors
Recent
2000 character limit reached

Qwen Series: Scalable Multimodal AI Models

Updated 2 December 2025
  • Qwen Series are advanced foundation models integrating language, vision, and audio capabilities using dense Transformers and MoE architectures.
  • They leverage innovations like Grouped Query Attention, rotary positional embeddings, and long-context techniques to boost performance and cross-modal reasoning.
  • Open-source releases and industrial adaptations demonstrate scalable deployment, efficient inference, and competitive benchmarks in real-world applications.

The Qwen Series is a family of large language, vision-language, audio-language, and multimodal foundation models developed primarily by Alibaba Group and affiliated research teams. Spanning multiple generations—Qwen (2023), Qwen1.5, Qwen2, Qwen2.5, and Qwen3—the series encompasses dense Transformer, Mixture-of-Experts (MoE), and specialized models for mathematical, coding, multimodal, and on-device applications. The corpus, architectural advances, multi-stage post-training, and open-weight releases position Qwen as a principal competitor to closed-source models, with performance matching or surpassing state-of-the-art alternatives in reasoning, multilinguality, context scaling, and cross-modal integration.

1. Generational Evolution and Model Variants

The series originated with Qwen (2023), introducing decoder-only LLMs at 1.8B, 7B, and 14B parameters, with conversation (Chat), code (Code-Qwen), and math (Math-Qwen) finetuning via Supervised Finetuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) (Bai et al., 2023). Qwen1.5 extended scaling (up to 110B) and introduced expert MoE layers. Qwen2 (2024) incorporated major architectural advances: Grouped Query Attention (GQA), rotary positional embeddings (RoPE with NTK-aware scaling), RMSNorm, and SwiGLU activations, supporting up to 72B dense and 57B MoE parameters with 14B activated per token (Yang et al., 15 Jul 2024). Qwen2.5 further scaled the pretraining data from 7T to 18T tokens, refined domain mixture, and added extensive long-context (up to 128K tokens in dense, 1M in Turbo) and MoE models (Turbo, Plus) with proprietary hosting (Qwen et al., 19 Dec 2024). Qwen3 (2025) unified “thinking” (reasoning) and “non-thinking” (fast chat) modes with a dynamic budget mechanism and expanded multilingual coverage to 119 languages, spanning eight dense (0.6B–32B) and two MoE models (30B-A3B, 235B-A22B), all released under Apache 2.0 (Yang et al., 14 May 2025).

Generation Key Models Parameters Context Lengths Multilingual Coverage Key Features
Qwen (2023) Base, Chat, Code, Math 1.8B–14B 2–16K 10–20+ RMSNorm, SwiGLU, RoPE, FlashAttn
Qwen2 (2024) Dense, MoE 0.5B–72B, 57B-A14B 32–131K ~30 GQA, DCA+YARN, extended RLHF
Qwen2.5 (2024/5) Dense, Turbo, Plus 0.5–72B dense, 3B/14B MoE 32K–1M ~30 18T tokens, long-context SFT/RLHF
Qwen3 (2025) Dense, MoE, Omni 0.6–235B 128K–1M 119 Unified thinking/non-thinking, Thinker–Talker MoE

2. Architectural Innovations and Scaling Laws

The Qwen family builds upon canonical decoder-only Transformers, systematically integrating advances for performance, efficiency, and scaling:

Scaling laws are empirically validated in Qwen2-VL, Qwen2.5-VL, and Qwen3 for vision-language and cross-modal settings: performance on math, document comprehension, VQA, and video tasks log-linearly increases with parameter and data size, saturating at ultra-large scales (Wang et al., 18 Sep 2024, Bai et al., 19 Feb 2025, Yang et al., 14 May 2025).

3. Post-Training, Alignment, and Specialized Models

Post-training unifies supervised alignment (SFT), multi-stage RLHF (Direct Preference Optimization/DPO, Group Relative Policy Optimization/GRPO), and reward model-driven selection. Preference data (truthfulness, helpfulness, conciseness, debiasing) shapes both base and instruction variants.

Specialized derivatives are prominent:

  • Qwen2.5-Math: Extensive self-improvement via synthetic data generation (Qwen2.5-Math-Instruct), iterative SFT with reward models, and tool-integrated reasoning (TIR), achieving new SOTA across math benchmarks: GSM8K (95.9%), MATH (85.9%), AIME (12/30) (Yang et al., 18 Sep 2024).
  • Qwen2.5-Coder: Automated code synthesis, multi-agent generation, static/dynamic checks.
  • DistilQwen: Knowledge distillation from large “teacher” LLMs (DeepSeek-R1, QwQ-32B) into smaller Qwen2.5/Qwen3 students, supporting slow-thinking (max accuracy) and adaptive-thinking (input-conditioned CoT length) for efficient reasoning (Cai et al., 3 Nov 2025).
  • QwQ (preprint): A reflective reasoning model at 32B parameters.

4. Multimodal and Cross-Modal Extensions

The Qwen-VL, Qwen2-VL, and Qwen2.5-VL branches extend the LLM backbone to visual and multimodal processing (Bai et al., 2023, Wang et al., 18 Sep 2024, Bai et al., 19 Feb 2025). Native dynamic-resolution Vision Transformers, Windowed and Global Attention, M-RoPE, agent-aligned reception, and long-context spatial/temporal encoding enable document, chart, UI, video, and GUI comprehension. Structured parsing outputs HTML tokens annotated with bounding boxes, supporting downstream conversion (to JSON, for instance).

Qwen-Audio fuses Whisper-derived audio encoding with Qwen-7B, training on >30 tasks (ASR, S2TT, diarization, AAC, AQA, music recognition) and managing output heterogeneity via hierarchical prefix tagging (Chu et al., 2023).

Qwen-Image introduces a frozen Qwen2.5-VL semantic encoder, VAE for image tokenization, and a Multimodal Diffusion Transformer enabling high-fidelity text rendering and editing (complex, multi-line, and logographic text) with state-of-the-art performance on English/Chinese image-text benchmarks and editing tasks (Wu et al., 4 Aug 2025).

Qwen3-Omni provides unified performance across text, images, audio, and video—via Thinker-Talker MoE, multimodal rotary position encoding, narrow-latency streaming TTS, and explicit chain-of-thought “Thinking” heads, matching closed-model SOTA in speech and AV reasoning (Xu et al., 22 Sep 2025).

5. Long-Context and Inference Efficiency

Qwen2.5-1M and QwenLong-CPRS address ultra-long context challenges (Yang et al., 26 Jan 2025, Shen et al., 23 May 2025):

  • Qwen2.5-1M leverages progressive pre-training, synthetic context synthesis (FIM, paragraph retrieval), ABF-scaled RoPE, and multi-stage SFT+RLHF to enable 1M-token processing, achieving 3×–7× prefill TTFT speedups and outperforming GPT-4o-mini in long-context retrieval and reasoning.
  • QwenLong-CPRS realizes “∞-LLMs” via natural language-guided dynamic context compression, bidirectional reasoning layers for improved boundary awareness, token critic mechanisms reusing LM heads, and window-parallel inference, delivering compression rates of up to 290× and new SOTA on Ruler-128K, InfiniteBench (Shen et al., 23 May 2025).

On-device deployment is a core focus: Qwen2.5-0.5B uses Activation-aware Weight Quantization (AWQ, INT4/FP16 per-channel) and FPGA-accelerated hybrid execution to halve memory and double throughput (~5.1 tokens/s) on edge platforms like the Xilinx Kria KV260 (Xiang et al., 24 Apr 2025). These compression techniques scale to larger variants with tradeoffs in bit-width accuracy, FPGA area, and bandwidth.

6. Empirical Performance and Benchmarks

Flagship dense (Qwen2.5-72B-Instruct, Qwen3-235B-A22B) and MoE (Turbo, Plus, Omni) models demonstrate competitive or superior results vs. closed-source and open-weight counterparts (Qwen et al., 19 Dec 2024, Yang et al., 14 May 2025, Xu et al., 22 Sep 2025):

  • Language Understanding/Reasoning: MMLU-Redux: Qwen3-235B-A22B-Base 87.81 (SOTA among open models); GSM8K (CoT): 94.39 (Yang et al., 14 May 2025).
  • Math and Code Benchmarks: HumanEval: Qwen2.5-72B-Instruct 86.6; MATH: 83.1; Arena-Hard: 81.2.
  • Cross-Lingual Coverage: INCLUDE (44 regions): Qwen3 83.6; MMMLU (14 langs): 86.7; 119 languages/dialects supported (Yang et al., 14 May 2025).
  • Vision-Language Benchmarks: Qwen2.5-VL-72B: ChartQA 89.5, OCRBench_v2 (En/CN): 61.5/63.7; matches or exceeds GPT-4o/Claude 3.5 Sonnet (Bai et al., 19 Feb 2025).
  • Audio Benchmarks: Qwen3-Omni: LibriSpeech clean WER 1.22%, GTZAN music genre 93.0 (SOTA).
  • Academic Writing: Qwen2.5 Max and Qwen3 235B generate high-volume, semantically faithful outputs, but suffer high AI detection and low readability per Flesch–Kincaid metrics (Aydin et al., 11 Feb 2025).

7. Deployment, Open-Weight Access, and Industrial Adaptations

All major Qwen2, Qwen2.5, and Qwen3 models are openly released on GitHub, Hugging Face, and ModelScope, with quantized int4/int8 variants, model cards, code, and end-to-end recipes for SFT, RLHF, and deployment (Yang et al., 15 Jul 2024, Qwen et al., 19 Dec 2024, Yang et al., 14 May 2025). Mixture-of-Expert proprietary models (Turbo, Plus, Omni) are hosted on Alibaba Model Studio for pay-as-you-go inference at scale. DistilQwen engines are integrated in the Alibaba PAI platform for elastically scalable training/inference and serve as core components in industrial RESTful deployments (Cai et al., 3 Nov 2025).

Inference tools, compression pipelines, and context optimization frameworks (AWQ, AutoAWQ, QLoRA, FlashAttention, BladeLLM, QwenLong-CPRS) are distributed with open-source code and documentation, facilitating efficient use across commodity GPU, FPGA, and mobile/NPU targets (Xiang et al., 24 Apr 2025, Yang et al., 26 Jan 2025, Shen et al., 23 May 2025).


The Qwen Series constitutes a broad, deeply optimized ecosystem of open-source and hosted foundation models, achieving competitive results against proprietary systems, and driving research in multi-modal, long-context, and domain-specialized inference (Bai et al., 2023, Yang et al., 15 Jul 2024, Qwen et al., 19 Dec 2024, Yang et al., 14 May 2025, Bai et al., 19 Feb 2025, Yang et al., 26 Jan 2025, Aydin et al., 11 Feb 2025, Xiang et al., 24 Apr 2025, Shen et al., 23 May 2025, Chu et al., 2023, Wu et al., 4 Aug 2025, Xu et al., 22 Sep 2025, Cai et al., 3 Nov 2025).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Qwen Series.