Prediction quality deterioration with increased temporal distance in GFVC

Determine how to maintain prediction accuracy in generative face video compression as the temporal distance between the reference frames and the target frames increases, thereby preventing reconstruction quality deterioration over long sequences.

Background

Generative Face Video Coding (GFVC) methods often predict target frames from one or more reference frames. As the temporal distance between a reference frame and subsequent target frames grows, these models commonly suffer from reconstruction drift and accuracy loss, especially under large pose variations.

Prior work has explored improved motion representations, multiple reference fusion, and side-information or residual coding. Despite these efforts, the authors explicitly note that the deterioration of prediction quality with increasing temporal distance from references remains unresolved, motivating their proposed multi-reference, contrastive-learning-based approach.

References

Despite these efforts, the issue of prediction quality deterioration as target frames move further in time from the references remains unsolved.

— Multi-Reference Generative Face Video Compression with Contrastive Learning (2409.01029 - Konuko et al., 2024) in Section 1 (Introduction)

Prediction quality deterioration with increased temporal distance in GFVC

Background

References

Related Problems