Versatile Video Coding (VVC)
- VVC is the latest international video coding standard that achieves significant bitrate reductions (30–50%) while supporting HD, UHD, HDR, VR, and screen content.
- It introduces advanced block partitioning (QTMT), numerous intra/inter prediction modes, and multiple transform techniques to improve rate–distortion performance.
- Despite its superior compression efficiency, VVC increases encoder/decoder complexity, spurring research into fast algorithms, machine learning acceleration, and hardware optimization.
Versatile Video Coding (VVC), standardized as ITU-T H.266 / ISO/IEC 23090-3, is the latest international video coding standard developed by the Joint Video Experts Team (JVET) and finalized in July 2020. VVC fundamentally advances compression efficiency compared to its predecessor, HEVC (H.265), achieving approximately 30–50% bitrate reduction for comparable visual fidelity across a wide spectrum of formats, including HD/UHD, HDR/WCG, 360° VR, and screen content. This superior coding performance is attributed to a comprehensive suite of new and enhanced tools that jointly increase both algorithmic and implementation complexity on encoder and decoder sides (Hamidouche et al., 2021, Amestoy et al., 2023).
1. Architectural Innovations and Coding Toolset
VVC maintains a hybrid block-based predictive framework, extensively upgraded to increase flexibility, coding gain, and adaptation to emerging content types. Central technical innovations include:
- Block Partitioning: Quadtree + Multi-Type Tree (QTMT) Each picture is decomposed into Coding Tree Units (CTUs) of up to 128×128 samples. Every CTU undergoes recursive partitioning via quadtree (QT) splits and multi-type binary or ternary tree splits (BTH/BTV, TTH/TTV), producing arbitrarily-shaped Coding Units (CUs) down to 4×4 (Hamidouche et al., 2021, Amestoy et al., 2023, Qureshi et al., 3 Mar 2025).
- Intra Prediction Modes VVC deploys 67 intra prediction modes (anglular, planar, DC), extensive Most Probable Mode (MPM) signaling, Multi-Reference-Line (MRL) selection, Intra Sub-Partitions (ISP), Matrix-based Intra Prediction (MIP), and chroma component cross-prediction (CCLM) (Amestoy et al., 2023, Hamidouche et al., 2021).
- Inter Prediction Enhancements VVC extends translational motion to include affine models, enhanced Merge/Skip candidates (MMVD, SMVD, HMVP), bi-directional optical flow (BDOF), decoder-side motion refinement (DMVR), geometric partitioning (GPM), combined inter/intra (CIIP), and refined bi-prediction weights (BCW) (Hamidouche et al., 2021).
- Transforms and Quantization The transform stage leverages Multiple Transform Selection (MTS) with DCT-II, DST-VII, and DCT-VIII kernels, complemented by Low-Frequency Non-Separable Transform (LFNST) on small block corners. Transform skip and explicit RDPCM are applied on eligible 4×4 blocks (Farhat et al., 2021).
- In-Loop Filters VVC integrates Sample Adaptive Offset (SAO) and Deblocking Filter (DBF) from HEVC, with Adaptive Loop Filter (ALF) and Cross-Component ALF (CCALF) for block-wise Wiener filtering and chroma refinement. Luma-Mapping with Chroma Scaling (LMCS) optionally enhances subjective quality on HDR content (Hamidouche et al., 2021, Kränzler et al., 2022).
2. Coding Efficiency and Rate–Distortion Performance
VVC achieves significant bitrate savings under JVET Common Test Conditions and real-world deployments:
| Resolution | PSNR BD-Rate | SSIM BD-Rate | VMAF BD-Rate |
|---|---|---|---|
| HD | –31.2% | –33.0% | –35.2% |
| UHD | –34.4% | –38.0% | –40.4% |
Subjective evaluations report up to 40% bitrate reduction for equivalent Mean Opinion Score (MOS) compared to HEVC (Amestoy et al., 2023). Per-tool ablation analysis confirms that QTMT partitioning (≈2.5% BD-rate gain), ALF (≈4.9%), and affine/inter prediction modules contribute a substantial fraction of the total coding gain (Amestoy et al., 2023, Hamidouche et al., 2021). However, these gains are content-dependent and diminish for low-complexity or static scenes.
Rate–distortion optimization throughout VVC follows , with tool selection steered by the Lagrange multiplier (RA) (Amestoy et al., 2023).
3. Encoder and Decoder Complexity Analysis
The increase in coding flexibility comes at a substantial computational and memory cost:
- Encoder Complexity VVC encoders are 5–10× more complex than HEVC under LD/RA, and up to 31× under AI conditions. Partitioning and mode decision dominate (>40%), followed by transform/quantization (22%) and motion estimation (53% in LD) (Pakdaman et al., 2020).
- Decoder Complexity VVC decoders are ≈1.5–2× more complex than HEVC. Loop filters (SAO + DBF + ALF) account for ≈30–40% of decoder run time, followed by motion compensation (17–26%), and entropy decoding (13–22%) (Pakdaman et al., 2020, Amestoy et al., 2022, Saha et al., 2022).
- Memory Bandwidth Encoding memory bandwidth requirements are ≈30× those of HEVC, while decoding requires ≈3× (Pakdaman et al., 2020). Efficient hardware or software implementations rely on scratchpad hierarchies and optimized data movement.
- Energy Efficiency Recent joint rate-distortion-energy design space exploration demonstrates that tool-level profiles can yield up to 47% decoder energy reduction at <30% bitrate overhead, or ≈35% energy saving for ≈6% bitrate increase (Kränzler et al., 2022, Kränzler et al., 2021). In-loop filters and complex inter modules (AFFINE, DMVR, BDOF) consume disproportionate energy (Kränzler et al., 2021).
4. Fast Encoding/Decoding Algorithms and Hardware Implementations
- Machine Learning-Driven Acceleration A Deep Learning-Based Intra Mode Derivation (DLIMD) replaces explicit intra mode signaling. Formulated as a 67-way classification task with both hand-crafted and CNN-learned features, DLIMD eliminates up to 36% mode signaling bits and achieves 2–3% overall BD-rate reduction in All-Intra settings. However, its run-time overhead is severe, inflating decoder time by ~140× over reference VTM (Zhu et al., 2022).
- Statistical and Feature-Based Acceleration SSIM-Variation-Based Complexity Optimization uses SSIMV as a proxy to prune ineffective split modes, reducing encoding time by ≈64.7% at <3% BD-rate loss (Lin et al., 2022). Time-Cost model-based control enables per-frame encoding complexity precision to within 3.2% error (Huang et al., 2022).
- Reference-Based Early Termination CTU partition maps from temporally close reference frames are leveraged for early termination in higher temporal-layer frames, reducing encoding time by ≈21% (ETRF) with only ≈4% BD-rate penalty, offering a superior quality-speed trade-off compared to aggressive tool disabling (Qureshi et al., 3 Mar 2025).
- Random Forest Feature Rate-Control Statistical spatial complexity features and Random Forest bit predictors deliver efficient two-pass rate control for All-Intra configurations, achieving 32% encoding-time reduction at only ≈2% BD-rate penalty (Menon et al., 2023).
- Hardware and SIMD Optimized Decoders ASIC designs for VVC inverse transforms utilize factorized MTS/LFNST kernels, resource sharing, and butterfly datapaths to sustain 4K@30fps with minimal area (~164k gates) and energy per sample (Farhat et al., 2021). SIMD-based software decoders (OpenVVC, VVdeC) achieve real-time FHD/UHD decoding on ARM and x86, with linear scaling in core count, aggression in tile/frame/thread-level pipelining, and memory footprints ≤2–3× lower than comparable decoders (Amestoy et al., 2022, Saha et al., 2022, Li et al., 2021).
5. Profiles, Rate-Control, and Application Scenarios
VVC encoders such as Fraunhofer HHI VVenC provide multi-level presets (faster, fast, medium, slow, slower), exposing Pareto-optimal trade-offs between coding efficiency and encoder complexity (Wieckowski et al., 2021, Qureshi et al., 3 Mar 2025). Recent research adapts VVC profiles for feature coding in machine-vision pipelines (MPEG-AI FCM); simplified profiles such as Fast, Faster, Fastest selectively disable in-loop filters, advanced inter/intra tools, and partitioning depth to achieve up to 95.6% encoding time reduction with minimal BD-rate loss (<2%)—demonstrating interoperability with downstream neural inference (Eimon et al., 9 Dec 2025).
Adaptations for energy/complexity-aware streaming include Green Metadata coding-tool profiles, enabling clients to request lower-complexity streams with flexible energy-rate constraints (Kränzler et al., 2022, Kränzler et al., 2021). Two-pass and machine-learning rate-control strategies further enhance operational flexibility without substantial penalty in distortion (Menon et al., 2023).
6. Quality Enhancement and Post-Processing
VVC streams, especially at low-bitrate, remain susceptible to residual artifacts (blockiness, blurriness, ringing). Prediction-Aware CNN-based enhancement filters leverage both decoded reconstructions and intra predictor signals to yield superior post-decoder quality, providing 6–16% BD-rate savings over baseline decoding (Nasiri et al., 2021). These mechanisms can be integrated decoder-side or evolve into specialized in-loop filters, albeit with increased resource requirements.
7. Deployment, Applications, and Outlook
Initial broadcast (DVB-T2/S2, ATSC-3.0), IPTV/OTT (DASH/CMAF), video conferencing, and VR/360° deployments validate the real-world viability of VVC’s toolset and open-source implementations (VVenC, VVdeC, OpenVVC, ATEME TitanLive) (Wieckowski et al., 2021, Hamidouche et al., 2021). Real-time decoding and energy-aware profiles are demonstrated on both desktop and mobile SoCs (Apple A14, Jetson Xavier), with ongoing integration into GPAC and FFmpeg chains. Memory-efficient designs ensure suitability for embedded consumer devices.
VVC’s complexity remains a major challenge for industry-scale adoption. Emerging directions focus on further hardware acceleration (ASIC/SIMD), low-complexity tool profiles, machine-vision feature compression, and unified rate-distortion-energy optimization. The adoption of learning-based inference-side coding and dynamic tool adaptation promises continued evolution tailored to future codecs and AI-integrated multimedia (Eimon et al., 9 Dec 2025, Kränzler et al., 2022, Zhu et al., 2022).