Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff (1901.07821v4)

Published 23 Jan 2019 in cs.LG, cs.CV, cs.IT, math.IT, and stat.ML

Abstract: Lossy compression algorithms are typically designed and analyzed through the lens of Shannon's rate-distortion theory, where the goal is to achieve the lowest possible distortion (e.g., low MSE or high SSIM) at any given bit rate. However, in recent years, it has become increasingly accepted that "low distortion" is not a synonym for "high perceptual quality", and in fact optimization of one often comes at the expense of the other. In light of this understanding, it is natural to seek for a generalization of rate-distortion theory which takes perceptual quality into account. In this paper, we adopt the mathematical definition of perceptual quality recently proposed by Blau & Michaeli (2018), and use it to study the three-way tradeoff between rate, distortion, and perception. We show that restricting the perceptual quality to be high, generally leads to an elevation of the rate-distortion curve, thus necessitating a sacrifice in either rate or distortion. We prove several fundamental properties of this triple-tradeoff, calculate it in closed form for a Bernoulli source, and illustrate it visually on a toy MNIST example.

Authors (2)

Yochai Blau (6 papers)
Tomer Michaeli (67 papers)

Citations (252)

View on Semantic Scholar

Summary

Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff

This paper introduces a pivotal expansion of the classical rate-distortion theory in the context of lossy compression by incorporating a third dimension: perceptual quality. Traditionally, rate-distortion theory, rooted in Shannon's foundational work, describes the tradeoff between the bit rate and distortion, typically measured by metrics such as Mean Squared Error (MSE) or Structural Similarity Index (SSIM). The authors, Yochai Blau and Tomer Michaeli, argue convincingly that focusing solely on minimizing distortion is insufficient for achieving high-quality perceptual outcomes, which are crucial for user satisfaction in applications like image, video, and audio compression.

Key Contributions

Rate-Distortion-Perception (RDP) Function: The authors propose a new theoretical framework for lossy compression, encapsulated in the rate-distortion-perception function, R(D,P). This function extends the traditional rate-distortion curve by introducing the perceptual quality index, providing a comprehensive description of the three-way tradeoff between rate, distortion, and perception.
Triple Tradeoff Analysis: By employing this framework, the authors demonstrate that enhancing perceptual quality typically necessitates a compromise in either or both the rate and distortion. This is mathematically grounded in their developed rate-distortion-perception function and demonstrated through the increase in necessary rate-distortion values to maintain perceptual quality.
Analytic and Experimental Validation: The paper provides an analytic solution for the rate-distortion-perception function in the Bernoulli source model and extends these insights through experimental demonstrations with MNIST image compression. By employing deep learning algorithms, the authors illustrate the elevated rate-distortion curves when perceptual quality is tightened.
Theoretical Properties: The authors establish several properties of the RDP function, showing it as monotone and convex, which is consistent with classical rate-distortion functions under certain conditions. They also prove that for any valid distortion measure, there is always a deviation between perceptual optimality and distortion minimization.

Numerical Findings and Implications

The authors perform empirical validation through carefully controlled experiments that highlight the implications of the proposed tradeoff. Their results are significant:

At low bit rates, traditional focus on distortion minimization results in poor perceptual quality.
In tasks such as image compression, algorithms optimized merely on rate-distortion curves may not suffice, potentially leading to suboptimal choices for end-user satisfaction in low bandwidth environments.

Future Directions

An intriguing aspect of the paper is its adaptability to different domains, suggesting that the proposed framework promises broader applicability beyond the immediate scope of image compression. The generalization towards other forms of media, including audio and video, seems both plausible and necessary, with potential extensions to various more complex perceptual evaluation criteria.

Concluding Remarks

The work by Blau and Michaeli offers a substantial revision of traditional compression paradigms by acknowledging perceptual quality as an integral component. This has far-reaching implications for academia and industry, guiding future algorithm development towards solutions that are not only efficient but also perceptually optimized. Moreover, it opens the path for more rigorous evaluations of compression algorithms, stressing the importance of perceptual metrics in deciding algorithmic adequacy, particularly in low-bandwidth, high-fidelity scenarios. This paper serves as a foundation for future research and practical implementations, encouraging explorations that blend rate efficiency with human-centric quality considerations.

PDF Markdown

Related Papers

Find Related Papers