Improved Encoding for Overfitted Video Codecs (2501.16976v2)

Published 28 Jan 2025 in eess.IV

Abstract: Overfitted neural video codecs offer a decoding complexity orders of magnitude smaller than their autoencoder counterparts. Yet, this low complexity comes at the cost of limited compression efficiency, in part due to their difficulty capturing accurate motion information. This paper proposes to guide motion information learning with an optical flow estimator. A joint rate-distortion optimization is also introduced to improve rate distribution across the different frames. These contributions maintain a low decoding complexity of 1300 multiplications per pixel while offering compression performance close to the conventional codec HEVC and outperforming other overfitted codecs. This work is made open-source at https://orange-opensource.github.io/Cool-Chic/

Summary

The paper introduces RAFT-guided motion estimation to enhance temporal redundancy prediction in neural video codecs.
It proposes a joint rate-distortion optimization strategy that reallocates rates based on frame importance to balance quality and efficiency.
Empirical results demonstrate compression performance nearing HEVC standards while maintaining low decoding complexity for low-power devices.

Improved Encoding for Overfitted Video Codecs: An Academic Overview

The paper "Improved Encoding for Overfitted Video Codecs" addresses key limitations in neural video codecs, particularly the challenges associated with balancing decoding complexity and compression efficiency. By introducing novel methods to enhance the encoding of overfitted video codecs, the researchers aim to close the performance gap with traditional codecs like HEVC while maintaining low computational requirements suitable for deployment on low-power devices.

Key Contributions and Methodology

The main contributions of this work center around two innovations: the integration of an optical flow estimator to guide motion information learning and a joint rate-distortion optimization strategy to enhance compression efficiency.

Guided Motion Information Learning: The researchers address the difficulty of capturing accurate motion information, which is a known bottleneck in overfitted codecs. They incorporate a pre-trained optical flow estimator, RAFT, into the encoding process. This estimator provides precise motion fields between frames, which the codec uses to predict and compensate for temporal redundancies more effectively. This step significantly improves the motion estimation accuracy, as evidenced by the demonstrations in the RaceHorses sequence.
Joint Rate-Distortion Optimization: The team proposes a novel joint optimization scheme that refines encoder parameters across all frames to minimize the overall video rate-distortion cost. This method ensures optimal rate allocation relative to frame importance and temporal dependencies within a video sequence, leading to improved compression with reduced distortion.

Results

The paper presents empirical results from experiments conducted using HEVC and UVG datasets. Compared to other overfitted codecs like Cool-chic video and C3, the proposed method demonstrates superior rate-distortion performance, approaching that of the HEVC standard (HM) while outperforming codecs such as x264-medium and x265-medium in both random access and low-delay configurations. In terms of decoding complexity, the researchers maintain a lightweight requirement of only 1300 multiplications per pixel, which is less than NeRV-based solutions and much lower than the complexity of state-of-the-art autoencoder-based codecs.

Implications and Future Directions

This work has practical implications for the deployment of video codecs on devices with limited computational resources, where decoding efficiency is critical. By improving overfitted codec designs, the proposed solutions may facilitate wider adoption of neural compression methods in consumer electronics. The pre-training of motion information hints at further opportunities to leverage existing models to enrich neural codec performance without additional decoding complexity overhead.

Future research could explore further optimization of encoding times and investigate how including the rate of neural network parameters during training could improve codec efficiency. Additionally, methods to expedite the convergence of overfitted codec training could enhance their practicality for real-time applications, aligning them more closely with the performance benchmarks set by autoencoder-based systems.

Limitations

While the paper advances the field, the encoding time remains a challenge, specifically for high-resolution frames. Future iterations and research could focus on reducing the computational burden during training, potentially by exploring techniques such as model compression or faster learning-rate schedules, to make joint optimization more accessible for broader use cases.

Related Papers

Tweets

https://twitter.com/ssh4net/status/1884482327340015836

https://twitter.com/CIGX/status/1884647964997566841