Reducing Complexity of HEVC: A Deep Learning Approach (1710.01218v3)

Published 19 Sep 2017 in cs.CV

Abstract: High Efficiency Video Coding (HEVC) significantly reduces bit-rates over the proceeding H.264 standard but at the expense of extremely high encoding complexity. In HEVC, the quad-tree partition of coding unit (CU) consumes a large proportion of the HEVC encoding complexity, due to the bruteforce search for rate-distortion optimization (RDO). Therefore, this paper proposes a deep learning approach to predict the CU partition for reducing the HEVC complexity at both intra- and inter-modes, which is based on convolutional neural network (CNN) and long- and short-term memory (LSTM) network. First, we establish a large-scale database including substantial CU partition data for HEVC intra- and inter-modes. This enables deep learning on the CU partition. Second, we represent the CU partition of an entire coding tree unit (CTU) in the form of a hierarchical CU partition map (HCPM). Then, we propose an early-terminated hierarchical CNN (ETH-CNN) for learning to predict the HCPM. Consequently, the encoding complexity of intra-mode HEVC can be drastically reduced by replacing the brute-force search with ETH-CNN to decide the CU partition. Third, an early-terminated hierarchical LSTM (ETH-LSTM) is proposed to learn the temporal correlation of the CU partition. Then, we combine ETH-LSTM and ETH-CNN to predict the CU partition for reducing the HEVC complexity for inter-mode. Finally, experimental results show that our approach outperforms other state-of-the-art approaches in reducing the HEVC complexity at both intra- and inter-modes.

Citations (267)

View on Semantic Scholar

Summary

The paper introduces a deep learning framework using ETH-CNN and ETH-LSTM to bypass exhaustive RDO search in HEVC encoding, significantly reducing complexity.
It leverages a large-scale dataset of intra- and inter-mode CU partitions to train models that effectively capture both spatial and temporal dependencies.
Experimental results demonstrate up to 70.52% complexity reduction for intra-modes and 62.94% for inter-modes, outperforming traditional SVM and shallow CNN approaches.

Analysis of "Reducing Complexity of HEVC: A Deep Learning Approach"

The paper "Reducing Complexity of HEVC: A Deep Learning Approach" introduces a methodology aimed at alleviating the computational burden associated with High Efficiency Video Coding (HEVC) by leveraging advanced deep learning techniques. HEVC is widely recognized for its capability to significantly reduce bit-rates compared to the H.264/AVC standard, but its high encoding complexity poses challenges for practical multimedia applications. This research proposes a sophisticated approach using convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) to predict coding unit (CU) partitions in both intra- and inter-modes of HEVC. The core objective is to replace the exhaustive rate-distortion optimization (RDO) search typically used in HEVC encoding processes.

Methodology and Approach

The authors establish a large-scale database containing substantial CU partition data, essential for training the proposed deep learning models. This database encompasses both intra-mode (2000 high-resolution images) and inter-mode (111 video sequences), allowing for diverse training conditions. The proposed approach involves representing CU partitions as hierarchical CU partition maps (HCPM), which are predicted using a developed early-terminated hierarchical CNN (ETH-CNN). This approach significantly curtails the encoding complexity by circumventing the brute-force RDO search process.

For inter-mode complexity reduction, an early-terminated hierarchical LSTM (ETH-LSTM) is designed to capture temporal dependencies in CU partitions across frames. By integrating ETH-CNN with ETH-LSTM, the method exploits both spatial patterns and temporal correlations, enhancing prediction accuracy and reducing computational load during the CU partition determination process.

Experimental Outcomes

The experimental setup includes a comprehensive evaluation of the approach against state-of-the-art methods. The results demonstrate that the proposed approach achieves superior complexity reduction, notably outperforming existing methods such as those based on SVM and shallow CNNs. The paper reports substantial reductions in encoding time alongside marginal losses in rate-distortion performance, as indexed by BD-BR and BD-PSNR. It is observed that the ETH-CNN approach achieves up to 70.52% complexity reduction for intra-modes and 62.94% for inter-modes under various QP settings, while maintaining reasonable RD performance.

Implications and Future Work

This research presents impactful implications for the practical deployment of HEVC, particularly in scenarios where computational resources are constrained or real-time processing is required. The compression strategy outlined could extend to other computational components of HEVC, such as prediction unit (PU) and transform unit (TU) predictions, potentially leading to even greater complexity reduction.

In future studies, further exploration into optimizing and accelerating the deep learning models is warranted. Techniques that enhance the speed and efficiency of these networks could render them even more advantageous for deployment in real-time systems. Additionally, expanding the application of deep learning models to other coding standards or multimedia applications could be a fruitful line of investigation. Implementing these methods on FPGA devices could also extend the utility of the research beyond software-based implementations.

In conclusion, this paper presents a detailed exploration of reducing HEVC complexity through innovative use of deep learning methodologies. The promising results demonstrate both practical reductions in computational demand and theoretical advancements in video encoding methodologies. These contributions mark significant steps forward in the field of video compression and efficiency.

PDF Markdown