Compositional Codebook Quantization
- Compositional codebook quantization is a vector quantization method that represents high-dimensional data as a composition of codewords from multiple structured codebooks.
- This approach dramatically increases representational capacity and enables fast, efficient search, compression, and generative modeling in applications like vision, language, and communication.
- Optimization techniques such as Gumbel-softmax relaxation, clustering initialization, and proximal gradient updates effectively reduce quantization error and prevent codebook collapse.
Compositional codebook quantization denotes a family of vector quantization schemes in which high-dimensional data, neural network weights, or descriptors are mapped to a compact discrete representation via a structured combination—composition—of codewords drawn from multiple codebooks. This compositional mechanism dramatically increases representational capacity per parameter relative to single-codebook schemes, supports efficient search and storage, and mitigates codebook collapse. Techniques falling under this definition span classical product quantization, additive and residual quantization, modern deep compositional frameworks, as well as advanced variants leveraging hierarchy, convexity, or learned mappings. These approaches are central to high-performance compression, retrieval, and generative modeling across large-scale vision, language, and communication systems.
1. Mathematical Foundations of Compositional Codebook Quantization
Formally, let denote a vector to be quantized. In compositional codebook quantization, is approximated by a function of multiple codebooks , each with codewords. The general forms are:
Concatenation (Product Quantization, PQ):
where is from codebook and each quantizes a disjoint -dimensional subspace ().
Summation (Additive Quantization, AQ; Hierarchical/Stacked Quantization, SQ; Residual, RVQ; or Deep Compositional Embeddings):
Low-dimensional Slot Partitioning (e.g., LooC (Li et al., 1 Jan 2026)):
Each slot is quantized via a shared or independent codebook, and the quantized approximation is .
Convex Compositionality (e.g., Soft Convex Quantization (Gautam et al., 2023)):
where is retrieved via a differentiable convex optimization, with typically sparse support, yielding a soft/compositional assignment to multiple codewords.
The assigned codeword indices or continuous weights are optimized to minimize a quantization or reconstruction loss, often subject to constraints (see (Martinez et al., 2014, Yvinec et al., 2023, Li et al., 1 Jan 2026, Gautam et al., 2023)).
2. Classical and Hierarchical Compositional Schemes
The origins of compositional codebook quantization trace to Product Quantization (PQ) and Additive Quantization (AQ), and their computationally efficient variants. In PQ, the codebooks are constrained to orthogonal subspaces enabling independent assignment, allowing extremely fast lookup-based distance computation for large scale indexing (Martinez et al., 2014). AQ relaxes the independence, representing as a sum of codewords from unconstrained codebooks, albeit at the expense of NP-hard encoding.
Stacked Quantizers (SQ) introduce a hierarchical, coarse-to-fine quantization process: each codeword at level quantizes the residual from the previous quantization, dramatically reducing quantization error and matching AQ's accuracy with encoding cost only linearly above PQ (Martinez et al., 2014). Residual vector quantization (RVQ) and multi-head octonary codebooks (MOC; (Zhou et al., 2024)) generalize this principle: features are successively quantized in multiple stages, each composing its own quantized correction.
3. Deep Compositional Quantization
Deep learning enables end-to-end learning of both codebooks and compositional mappings:
- Jointly Learnable Codebooks and Mappings (JLCM) (Yvinec et al., 2023) compresses pretrained neural network weights by clustering neuron rows, reordering, and assigning each group (block) to its own codebook. Only the codeword index is stored per weight; group-to-codebook mapping is implicit via row order and partitioning. Joint optimization targets reconstruction and feature-distillation losses, with a novel proximal gradient to avoid large quantization jumps, yielding efficient memory reduction for large DNNs without architectural changes.
- Deep Unsupervised Neural Quantization (UNQ) (Morozov et al., 2019) generalizes MCQ with a deep network producing "heads," each mapped to its own codebook. Differentiable Gumbel-softmax relaxation and end-to-end autoencoder frameworks yield quantization codes with outstanding retrieval performance, outperforming traditional MCQ and lattice methods.
- Word Embedding Compression (Shu et al., 2017) adopts a sum-of-codes paradigm: each word is reconstructed as a sum of basis vectors selected via a discrete code, learned via the Gumbel-softmax trick. Lossless compression rates above 94% are shown for NLP models.
- Plug-and-Play Low-Dimensional Codebook (LooC) (Li et al., 1 Jan 2026) achieves O() representation with only codebook parameters by quantizing low-dimensional slots within features. A parameter-free spatial interpolation/smoothing enhances fidelity, and 100% codebook usage scales to large reductions in codebook size.
Table: Representative Deep Compositional Quantization Methods
| Approach | Composition Principle | Code Assignment |
|---|---|---|
| JLCM | Groupwise codebooks | Blocked after clustering |
| UNQ | Multi-head deep encoding | Gumbel-softmax, learned |
| Word CompEmb | Summed basis vectors | Gumbel-softmax |
| LooC | Slotwise low-dim codebook | Per-slot nearest neighbor |
4. Advances in Expressivity and Codebook Efficiency
Compositional approaches enable exponential increases in effective representational power for a fixed parameter budget. For example, codebooks each with codewords provide possible codes, while classical single codebooks offer only . Hierarchical arrangements (stacking, residual, multilevel) further boost expressivity (e.g., RVQ and MOC: effective codes; (Zhou et al., 2024)). Dual codebook strategies for image modeling (e.g., Dual Codebook VQ (Malidarreh et al., 13 Mar 2025)) show how splitting latent features into global and local parts, each quantized via separate codebooks, prevents codebook collapse, increases utilization, and achieves lower FID in image synthesis compared to single-codebook VQ variants.
Key performance metrics evaluating these gains include LPIPS, PSNR, SSIM, rFID, FID, codebook usage, and retrieval recall (see (Li et al., 1 Jan 2026, Malidarreh et al., 13 Mar 2025, Morozov et al., 2019)).
5. Optimization Strategies
Optimization of compositional codebook quantizers necessitates careful treatment of discrete assignment variables and codebook usage:
- Clustering Initialization: Hierarchical agglomerative or k-means clustering provides robust initialization of codebooks and soft assignments (Yvinec et al., 2023, Martinez et al., 2014).
- Gumbel-Softmax Relaxation: Discrete index selection is made differentiable during training via Gumbel noise and softmax relaxation, with hard selection at inference (Shu et al., 2017, Morozov et al., 2019).
- Proximal Gradient for Indices: JLCM (Yvinec et al., 2023) introduces a custom gradient proportional to the inverse distance between codewords to favor local index adjustments over erratic jumps.
- Convex Optimization: Soft Convex Quantization (SCQ; (Gautam et al., 2023)) replaces assignment with a differentiable quadratic program yielding convex codeword weights, leading to high codebook perplexity, improved quantization error, and smooth optimization.
- Interpolation by Smoothing: LooC applies bilinear interpolation and spatial averaging before and after slotwise quantization, yielding improved detail preservation and less blocky artifacts (Li et al., 1 Jan 2026).
Regularizers and balance constraints can further prevent codebook collapse and support uniform assignment (Morozov et al., 2019).
6. Applications, Performance, and Scalability
Compositional quantization is deployed in diverse settings:
- Model Compression: JLCM achieves compression sufficient to fit LLMs (e.g., Llama 7B to 2GB) on mobile hardware (Yvinec et al., 2023).
- Image and Multimedia Retrieval: Deep compositional and residual hierarchies, as well as fast lookup-based objectives, yield state-of-the-art recall across billion-scale retrieval scenarios (Martinez et al., 2014, Morozov et al., 2019, Li et al., 1 Jan 2026).
- Semantic Communication: MOC-RVQ composes octonary heads and RVQ stages for digital generative communication, maximizing spectral efficiency and robustness over noisy channels (Zhou et al., 2024).
- Generative Modeling: Dual codebook VQ-GANs, LooC-enhanced latent diffusion, and SCQ autoencoders outperform single-codebook baselines in FID and code utilization (Malidarreh et al., 13 Mar 2025, Li et al., 1 Jan 2026, Gautam et al., 2023).
- Word Embedding Compression: End-to-end learned compositional coding yields >94%–99% storage reduction without loss for sentiment analysis and MT tasks (Shu et al., 2017).
Table: Selected Empirical Results and Benefits
| Domain | Notable Gain | Key Source |
|---|---|---|
| LLM weight comp | 2GB fit, no loss | (Yvinec et al., 2023) |
| Image retrieval | +3–5pp Recall@1 | (Morozov et al., 2019) |
| Image synthesis | FID ↓30–60% | (Malidarreh et al., 13 Mar 2025) |
| Embedding comp | 98%+ reduction | (Shu et al., 2017) |
7. Limitations and Ongoing Directions
While compositional codebook quantization provides substantial practical and theoretical benefits, certain limitations remain:
- Trade-offs exist between encoding speed, codebook parameter count, and quantization accuracy, especially for non-independent codebooks (e.g., AQ vs. PQ vs. SQ; (Martinez et al., 2014)).
- Some architectures carry extra memory overhead due to auxiliary learnable parameters (e.g., decoders in deep MCQ; (Morozov et al., 2019)).
- Hyperparameter sensitivity (number and size of codebooks, regularization strengths) can impact performance and is often resolved by grid or one-cycle scheduling (Li et al., 1 Jan 2026, Morozov et al., 2019).
- Optimization of discrete assignments remains nonconvex; SCQ and Gumbel-softmax relaxations partially address differentiability and stability (Gautam et al., 2023, Shu et al., 2017).
- Applicability to non-vectorial or structured data is limited; recent plug-and-play modules (LooC) support broader integration (Li et al., 1 Jan 2026).
- Codebook utilization and collapse are generally addressed by compositionality, balance regularization, and proximal updates, but can present in pathological data or large-scale unbalanced domains.
Ongoing efforts target improved trade-offs via hierarchical, hybrid, or adaptive codebook models, scaling to trillion-parameter networks and global communication systems, and further aligning theoretical expressivity with hardware efficiency.