Vector Quantization & VQ-VAE

Updated 10 March 2026

Vector quantization and VQ-VAE are foundational techniques for discrete representation learning, mapping continuous data to a finite codebook for efficient compression and generative modeling.
VQ-VAE integrates a quantization layer within an encoder–decoder framework, using techniques like the straight-through estimator to manage non-differentiable operations.
Recent innovations address issues like codebook collapse through adaptive quantization strategies, EMA updates, and lattice-based architectures to enhance performance.

Vector Quantization (VQ) and VQ-Variational Autoencoders (VQ-VAE) constitute a foundational approach to discrete representation learning in machine learning, enabling efficient tokenization, compression, and generative modeling by replacing continuous latents with codebook-based discrete variables. The VQ-VAE family traces its origins to efforts in circumventing the limitations of continuous VAEs (e.g., posterior collapse) and extends through a large landscape of theoretical, architectural, and algorithmic developments. This article surveys the mathematical foundations, model architectures, optimization paradigms, systematics of codebook utilization, and key research directions surrounding VQ and VQ-VAE.

1. Mathematical Foundations of Vector Quantization

Vector quantization is fundamentally a discretization process that maps high-dimensional continuous vectors to a finite set of prototype embeddings, or codevectors, typically organized in a codebook $C = \{c_1, \ldots, c_K\} \subset \mathbb{R}^D$ . The canonical quantization operator for an input $z \in \mathbb{R}^D$ is defined by nearest-neighbor search: $k^* = \arg\min_{j \in \{1, \ldots, K\}} \| z - c_j \|_2, \quad z_q = c_{k^*}$ This non-differentiable argmin presents unique optimization challenges in end-to-end deep learning contexts. The quantization operation is fundamentally a combinatorial assignment, with the training objective often regularized by a commitment term to ensure stable encoder-codebook alignment and prevent the encoder outputs from drifting arbitrarily far from the codebook representatives (Oord et al., 2017).

In classical information bottleneck theory, vector quantization imposes a rate constraint on the latent space, with the codebook size $K$ upper-bounding the entropy $H(Z) \le \log K$ , and thus controlling generalization and representation power (Wu et al., 2018).

2. VQ-VAE: Architecture and Loss Formulation

VQ-VAE integrates vector quantization into the VAE framework, instantiating the encoder–quantizer–decoder pipeline:

Encoder: $z_e = \mathrm{Enc}_\phi(x)$
Quantizer: $z_q = Q(z_e) = c_{k^*}$
Decoder: $\hat{x} = \mathrm{Dec}_\theta(z_q)$

The standard VQ-VAE loss comprises three terms: $\mathcal{L}(x) = \underbrace{\| x - \hat{x} \|_2^2}_{\text{reconstruction}} + \underbrace{ \| \operatorname{sg}[z_e] - c_{k^*} \|_2^2 }_{\text{codebook loss}} + \underbrace{ \beta \| z_e - \operatorname{sg}[c_{k^*}] \|_2^2 }_{\text{commitment loss}}$ where the stop-gradient operator $\operatorname{sg}[\cdot]$ blocks gradients, allowing for separate updates to the encoder and the codebook (Oord et al., 2017). The non-differentiability of $Q$ is sidestepped using the straight-through estimator (STE), which copies gradients from $z_q$ to $z_e$ .

The codebook itself can be trained by direct gradient descent on the VQ loss or through an exponential moving average (EMA) update that mirrors online $K$ -means (Roy et al., 2018, Razavi et al., 2019). Stochastic or soft assignments, Gumbel-softmax relaxations, and entropy penalties have been investigated to enhance discrete code utilization (Takida et al., 2022, Yan et al., 2024).

3. Codebook Utilization and the Problem of Collapse

A notable limitation of vanilla VQ-VAE is codebook collapse—where only a small subset of embeddings are utilized and the remaining vectors are never selected during training. This issue arises because only codebook vectors that are selected as the nearest neighbor during a mini-batch update receive gradients or EMA updates, leaving others stagnant. The phenomenon is exacerbated by the non-stationarity of encoder updates, which can shift Voronoi boundaries and render certain codevectors perennially inactive (Lu et al., 21 Feb 2026).

A variety of remedies have been proposed:

Anchor or cluster-based updates: CVQ-VAE re-initializes dead codes to sampled encoder feature anchors to ensure 100% codebook utilization (Zheng et al., 2023).
Kernel-based propagation: NS-VQ propagates encoder drift to all codes via a kernel-based rule on the codebook space (Lu et al., 21 Feb 2026).
Transformer-based mappings: TransVQ applies an attention-based mapping to the codebook to adaptively move the entire set and maintain utilization (Lu et al., 21 Feb 2026).
Stochastic quantization: SQ-VAE deploys a self-annealing stochastic quantizer, beginning with high assignment entropy and gradually converging to deterministic assignments (Takida et al., 2022).
Bayesian and variational approaches: HQ-VAE, GM-VQ, and VAEVQ employ entropy-regularized Bayesian posteriors, variational quantization, or Wasserstein objectives to maximize codebook entropy and prevent collapse without the need for explicit heuristics (Takida et al., 2023, Yan et al., 2024, Yang et al., 10 Nov 2025).

4. Extensions and Modern Variants of VQ-VAE

Over the last several years, the VQ-VAE paradigm has expanded to accommodate greater representational capacity, scalability, and practical deployment requirements:

Lattice-Based and Product Quantization: LL-VQ-VAE replaces the codebook with a parameterized lattice, reducing parameter count and making quantization cost independent of $K$ (Khalil et al., 2023). Product Quantization and Multi-Group VQ (MGVQ) architectures factorize embeddings into low-dimensional subvectors, each quantized independently, exponentially expanding the representational capacity and enabling high-fidelity, large-scale generative modeling (Jia et al., 10 Jul 2025, Wu et al., 2018).
Adaptive and Dynamic Quantization: Dynamic quantizer selection via Gumbel-softmax, as well as codebook adaptation for variable-rate settings (RAQ), allows the model to automatically balance codebook size and embedding dimensionality or adapt to new bitrate requirements without retraining (Chen et al., 2024, Seo et al., 2024).
Scalar Quantization Alternatives: FSQ eschews learnable codebooks entirely, using scalar quantization per channel and a Cartesian product to implicitly define the code space, sidestepping code collapse and further reducing parameterization (Mentzer et al., 2023).
Lattice, Hyperbolic, and Geometric Quantization: LL-VQ-VAE enforces a lattice structure on the codebook entries, preserving high utilization. Hyperbolic VQ (HyperVQ) leverages hyperbolic geometry to exponentially expand volume near the boundary, improving cluster separability and preventing collapse by improved packing in latent space (Khalil et al., 2023, Goswami et al., 2024).
Generative Objectives Beyond MSE: VQ-WAE recasts tokenization as optimal transport minimization via the Wasserstein distance, allowing codevectors to be interpreted as cluster centroids in a latent geometry that mirrors target data distributions and ensuring superior codebook usage and generative quality (Vuong et al., 2023).

5. Empirical Benchmarks and Comparative Performance

VQ-VAE and its variants are benchmarked across a wide spectrum of datasets (CIFAR-10, ImageNet, FFHQ, CelebA, medical volumes), with reconstruction MSE, LPIPS, SSIM, rFID, and codebook perplexity (usage) as standard metrics.

Codebook Utilization: CVQ-VAE, NS-VQ, TransVQ, and many stochastic/variational approaches routinely achieve 100% code usage, as measured by empirical perplexity, compared to as low as <10% in naive VQ-VAE at large $K$ (Zheng et al., 2023, Lu et al., 21 Feb 2026, Takida et al., 2023, Yang et al., 10 Nov 2025). HyperVQ achieves significantly higher codebook perplexity and discriminative accuracy compared to Euclidean VQ (Goswami et al., 2024).
Reconstruction and Generative Quality: MGVQ demonstrates that discrete tokenizers, when properly designed (multi-group), can match or exceed the reconstruction fidelity (PSNR, rFID) of continuous-VAEs, even at HD or 2K resolution. GQ with TDC offers a theoretically optimal and training-free quantizer, consistently outperforming prior VQ-VAE variants in both reconstruction and generative downstream tasks (Jia et al., 10 Jul 2025, Xu et al., 7 Dec 2025).
Efficiency and Scalability: Lattice-based quantization (LL-VQ-VAE) provides $O(1)$ quantization and parameter count independent of $K$ , with lower reconstruction error and high codebook utilization (Khalil et al., 2023). FSQ achieves near-equivalent performance to classical VQ in MaskGIT-style generative architectures with perfect codebook utilization (Mentzer et al., 2023).

6. Recent Theoretical Advances and Model-Based Explanations

A rigorous theoretical account connects VQ-VAE to hard $K$ -means (Viterbi-EM), with convergence guaranteed under stationary encoder distributions and Robbins–Monro step-size conditions (Roy et al., 2018). Empirical and analytic investigations now explain codebook collapse primarily as a result of non-stationary encoder updates: as $\theta$ drifts with SGD, previously active codes fall outside the current data manifold and cease to receive updates (Lu et al., 21 Feb 2026). Remedies (NS-VQ, TransVQ) are grounded in kernelized propagation and global codebook transformations, preserving theoretical convergence to the k-means attractor in the presence of encoder drift.

Additionally, connections to optimal transport, rate-distortion theory, and information bottleneck formalisms anchor recent quantizer variants (VQ-WAE, RAQ), tying the discrete bottleneck's statistical properties to downstream rate-fidelity trade-offs and representation compressibility (Vuong et al., 2023, Seo et al., 2024).

7. Outlook, Limitations, and Research Directions

Efforts in enhancing discrete latent modeling via VQ and VQ-VAE are likely to remain central for large-scale generative modeling, compression, and robust representation learning. Notable open challenges and directions include:

Dynamic architecture search for codebook parameters (e.g., adaptive $K$ , $D$ ) (Chen et al., 2024).
Extending strong codebook utilization principles to hierarchical, multi-scale, and autoregressive architectures (e.g., VQ-VAE-2, LDM) (Takida et al., 2023, Razavi et al., 2019).
Exploring new geometries and priors: hyperbolic, lattice-based, and Wasserstein-based quantizers offer improved separability and utilization (Goswami et al., 2024, Khalil et al., 2023, Vuong et al., 2023).
Unification of stochastic, variational, and deterministic quantization under Bayesian and information-theoretic frameworks (Takida et al., 2022, Yang et al., 10 Nov 2025, Yan et al., 2024).
Model and codebook adaptation to diverse bitrate, modality, and downstream generative requirements (Seo et al., 2024).

Limitations remain in scaling structured quantizers (e.g., hyperbolic, lattice) to very high $K$ or supporting fine-grained hierarchical tasks, as well as balancing the trade-off between compression (rate) and downstream generation quality (Seo et al., 2024, Jia et al., 10 Jul 2025).

In summary, vector quantization and VQ-VAE offer a compelling, theoretically grounded toolkit for discrete latent modeling; advances in codebook optimization, geometric priors, modular quantizers, and adaptive architectures continue to close the gap between discrete and continuous representation learning, with broad ramifications for compression, generative modeling, and tokenization across vision, audio, and multimodal domains (Oord et al., 2017, Jia et al., 10 Jul 2025, Khalil et al., 2023, Lu et al., 21 Feb 2026).