Trellis-Coded Quantization (TCQ)
- Trellis Coded Quantization (TCQ) is a method that combines convolutional coding with a finite-state trellis to optimize vector quantization via dynamic programming.
- TCQ achieves superior rate-distortion performance by exploiting trellis memory to reduce quantization error and maximize shaping gain through optimal convolutional codes.
- TCQ is applied in communications, video coding, and neural network quantization, offering practical trade-offs between computational complexity and performance.
Trellis-Coded Quantization (TCQ) is a structured approach to vector quantization that employs convolutional coding combined with a finite-state trellis and dynamic programming. The method achieves rate-distortion performance superior to scalar quantization by leveraging trellis-induced code constraints, enabling efficient encoding and decoding in high-dimensional, memory-bounded, or feedback-limited scenarios. TCQ is a key tool in modern source coding, communication, and quantized neural network deployment due to its favorable trade-offs of complexity, shaping gain, and algorithmic flexibility.
1. Theoretical Foundations and Design Principles
TCQ generalizes scalar and vector quantization by imposing a convolutional code structure on the labeling of quantizer reproduction points. The method creates a finite-state trellis, with each state corresponding to memory in the code and each branch corresponding to a codeword selection from a partitioned reproduction set. The quantizer is specified by:
- A directed graph G = (S, E), with |S| states and 2k outgoing edges per state, each edge labeled by a k-bit codeword and associated with a reconstruction level or vector.
- The path through the trellis encodes the input sequence, seeking to minimize the cumulative distortion, often Euclidean distance or Hamming distance between source and reconstructed values.
The branch labeling typically uses a distance-preserving mapping from binary codewords (derived from a convolutional code) to quantizer reproduction points. Optimal code selection leverages maximum-Hamming-distance convolutional codes to increase the minimal separation between codewords in the trellis, maximizing shaping gain for a given state complexity (0704.1411).
Dynamic programming via the Viterbi algorithm is used to identify the path of minimum cumulative distortion, which constrains codeword selection across the input sequence. For a source sequence , the optimal quantized sequence is obtained by
where denotes the reconstruction value associated with state (Tseng et al., 2024).
2. Rate–Distortion Performance and Shaping Gain
In the high-rate regime, TCQ approaches the rate-distortion bound for memoryless sources. The primary performance metric is the granular gain , defined as the reduction in normalized second moment compared to a uniform quantizer:
with the quantizer step size (0704.1411). By exploiting trellis memory, TCQ reduces average quantization error below that of independent scalar quantizers, with the degree of improvement depending on the number of trellis states and the memory of the underlying code (Kieffer et al., 2010).
The selection of convolutional codes maximizing free Hamming distance further improves granular gain, particularly as state complexity increases. Empirical studies demonstrate incremental shaping gain as trellis states increase, though with diminishing returns beyond moderate state counts (0704.1411).
3. Algorithmic Structure and Implementation Complexity
TCQ achieves practical encoding and decoding via:
- Finite-state dynamic programming (Viterbi algorithm), scaling in cost as per source block, with trellis states and the length of the source sequence (Choi et al., 2013).
- Exploitation of tail-biting or sliding-window schemes for block operation and boundary conditions.
- Hardware-aware trellis structures such as the "bit-shift" trellis in QTIP, which enables the decoupling of codebook size from quantization dimension and highly parallel decode/encode operations (Tseng et al., 2024).
Modern variants utilize on-the-fly codebook generation via pseudo-random hashes or mixing small lookup tables for reconstruction values to minimize memory footprint and maximize throughput (Tseng et al., 2024).
4. Optimality, Markov Chain Analysis, and Graph Labeling
The fundamental analysis of asymptotic distortion in TCQ is facilitated by modeling the reduced state evolution induced by the Viterbi algorithm as a finite-state Markov chain (Kieffer et al., 2010). In this framework, the stationary distribution over reduced states and the single-step cost function jointly yield exact closed-form expressions for asymptotic per-sample distortion:
where is the Markov transition probability defined by the trellis structure and labeling, and measures when a minimum-cost increment occurs.
The design of graph structures and labeling functions is critical. Optimal performance is achieved by jointly selecting the trellis and labeling to minimize at the target rate, often using symmetries and equivalence classes to reduce the search space (Kieffer et al., 2010).
5. Applications in Communications, Compression, and Model Quantization
Limited Feedback MIMO/MISO Systems
TCQ has proven essential in scalable limited-feedback channel state information (CSI) quantization for massive MIMO and massive MISO systems:
- Conventional random vector quantization (RVQ) becomes infeasible at large antenna counts due to exponential codebook growth.
- TCQ, via trellis-extended codebooks (TEC) and differential schemes, enables fractional-bit quantization per channel entry, reduces codebook search complexity to linear in the number of antennas, and provides substantial beamforming gain and spectral efficiency improvements over RVQ and noncoherent TCQ (NTCQ) (Choi et al., 2013, Mirza et al., 2014, Choi et al., 2014).
- Temporal and spatial correlations are exploited with source-constellation translation/scaling and phase adjustment, achieving up to 1.6 dB beamforming gain improvements and significant spectral efficiency benefits at practical feedback rates.
Image and Video Coding Standards
TCQ is integrated into advanced video codecs (e.g., VVC and AV2) as an alternative to scalar quantization for transform coefficients:
- In Versatile Video Coding (VVC), TCQ enables rate-distortion optimized selection of quantization paths over blocks, yielding significant rate savings while maintaining low complexity. Adaptive strategies such as trellis departure point selection and branch pruning further reduce encoder cost with negligible loss in BD-rate (Wang et al., 2020).
- In AV2, TCQ provides up to 1% BD-rate reduction, particularly effective on smooth natural-content sequences. The quantization process is tightly interlocked with multi-symbol arithmetic coding, ATC scan orders, and context-adaptive models, thereby improving luma TB coding with minimal decoding impact (Nalci et al., 6 Jan 2026).
Deep Learning and Neural Network Quantization
For post-training quantization (PTQ) of LLMs and neural network weights, TCQ—specifically as realized in QTIP—enables ultra-high-dimensional quantization without exponential codebook scaling:
- The bit-shift trellis architecture allows high dimension (e.g., 256D) TCQ with a codebook size only linear in the trellis state count, outperforming low-dimensional VQ (≤8D) in rate-distortion tradeoff and hardware efficiency (Tseng et al., 2024).
- On-the-fly codebook computation eliminates the memory bottleneck of traditional VQ, with quantizer hardware operating at >80% of memory bandwidth.
- Empirically, QTIP achieves perplexity and downstream task accuracy within ≈1–2% of FP16 at 2-bit quantization, matching or improving on other SOTA PTQ methods at similar bitrates.
Learned Image Compression
Integration of TCQ within variational autoencoder (VAE)-based learned image compression is challenging due to optimization difficulties:
- The quantizer is non-differentiable, with interdependent trellis decisions and entropy-model-driven rate terms blocking effective gradient flow.
- Empirically, a two-stage strategy with approximate/noise-based pretraining followed by decoder/hyperdecoder finetuning on true quantized latents closes the gap, yielding 1–2% BD-rate improvements and up to 0.23 dB PSNR gains, with maximal benefits for entropy-constrained quantizers (Borzechowski et al., 10 Jun 2025).
Relaying and Joint Source-Channel Coding
In compress-and-forward relays, TCQ provides vector-quantization-based shaping gain (>1 dB over scalar quantization), enables soft-LLR computation via the BCJR algorithm on the trellis, and supports joint iterative decoding with multilevel LDPC codes (Wan et al., 2023).
6. Complexity–Performance Tradeoffs and Practical Considerations
TCQ encoding/decoding complexity is linear in source dimension for fixed state size, with practical state counts (e.g., 8–256) sufficient for most gains. The computational cost increases only moderately with state count, especially when using hardware-optimized trellis structures and low-overhead codebook generation (Tseng et al., 2024, Choi et al., 2013). Trade-offs include:
- State count selection: Larger trellis memory improves distortion but with diminishing returns and increased storage for path metrics.
- Granular gain: Maximum-Hamming-distance convolutional codes always realize the best shaping gain at a given state complexity (0704.1411).
- Application constraints: For PTQ, parallelizability and cache footprint are as crucial as distortion; for feedback-limited wireless systems, feedback overhead and compatibility with existing encoder/decoder architectures are decisive.
7. Historical Development and Ongoing Research Directions
TCQ was introduced to realize the shaping gain predicted by rate-distortion theory in a practical, scalable fashion—analagous to Ungerboeck's trellis-coded modulation for channel coding. The approach has since evolved, incorporating code optimization (distance-preserving labeling, maximum-distance codes), scalable and hardware-friendly trellis structures (bit-shift, tail-biting), and adaptivity to advanced system requirements (fractional-rate feedback, hybrid lookup-compute codebooks) (Choi et al., 2014, 0704.1411, Tseng et al., 2024).
Contemporary research focuses on integration with deep learning pipelines, efficient entropy-driven quantization for learned codecs, adaptive complexity scaling for video, and further hardware specialization for high-dimensional quantization in memory-bounded inference workloads.
References:
- "Trellis-Coded Quantization Based on Maximum-Hamming-Distance Binary Codes" (0704.1411)
- "Exact Hamming Distortion Analysis of Viterbi Encoded Trellis Coded Quantizers" (Kieffer et al., 2010)
- "Noncoherent Trellis Coded Quantization: A Practical Limited Feedback Technique for Massive MIMO Systems" (Choi et al., 2013)
- "Limited Feedback Massive MISO Systems with Trellis Coded Quantization for Correlated Channels" (Mirza et al., 2014)
- "Trellis-Extended Codebooks and Successive Phase Adjustment: A Path from LTE-Advanced to FDD Massive MIMO Systems" (Choi et al., 2014)
- "Low Complexity Trellis-Coded Quantization in Versatile Video Coding" (Wang et al., 2020)
- "Transform and Entropy Coding in AV2" (Nalci et al., 6 Jan 2026)
- "QTIP: Quantization with Trellises and Incoherence Processing" (Tseng et al., 2024)
- "Compress-and-Forward via Multilevel Coding and Trellis Coded Quantization" (Wan et al., 2023)
- "Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization" (Borzechowski et al., 10 Jun 2025)