Old Photo Restoration via Deep Latent Space Translation

Published 14 Sep 2020 in cs.CV and cs.GR | (2009.07047v1)

Abstract: We propose to restore old photos that suffer from severe degradation through a deep learning approach. Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize. Therefore, we propose a novel triplet domain translation network by leveraging real photos along with massive synthetic image pairs. Specifically, we train two variational autoencoders (VAEs) to respectively transform old photos and clean photos into two latent spaces. And the translation between these two latent spaces is learned with synthetic paired data. This translation generalizes well to real photos because the domain gap is closed in the compact latent space. Besides, to address multiple degradations mixed in one old photo, we design a global branch with apartial nonlocal block targeting to the structured defects, such as scratches and dust spots, and a local branch targeting to the unstructured defects, such as noises and blurriness. Two branches are fused in the latent space, leading to improved capability to restore old photos from multiple defects. Furthermore, we apply another face refinement network to recover fine details of faces in the old photos, thus ultimately generating photos with enhanced perceptual quality. With comprehensive experiments, the proposed pipeline demonstrates superior performance over state-of-the-art methods as well as existing commercial tools in terms of visual quality for old photos restoration.

Abstract PDF Upgrade to Chat

Citations (61)

View on Semantic Scholar

Summary

The paper introduces a triplet domain translation network that maps real, degraded, and clean photos into a unified latent space.
It tackles complex degradations by using a global branch with a partial nonlocal block and a local branch to correct both structured and unstructured defects.
The paper incorporates a face refinement network to recover high-resolution facial details, significantly enhancing perceptual quality.

Overview of "Old Photo Restoration via Deep Latent Space Translation"

This paper presents a novel approach to old photo restoration using a deep learning framework designed to handle multiple, complex forms of degradation. Given the intricate nature of real-world photo defects, conventional supervised learning techniques struggle due to the domain gap between synthetic training data and actual old photos. The proposed method introduces a triplet domain translation network that leverages latent space modeling to address this challenge effectively.

Key Contributions

Triplet Domain Translation Network: The proposed method introduces a triplet domain consisting of real old photos, synthetically degraded images, and clean images. By mapping these domains to corresponding latent spaces using variational autoencoders (VAEs), the network closes the domain gap between real and synthetic images. Latent space translation is then employed to restore corrupted images.
Handling Mixed Degradation: The paper addresses the mixed nature of degradation in old photos. It introduces a global branch with a partial nonlocal block to focus on structured defects such as scratches and a local branch to target unstructured defects like noise and blurriness. The fusion of the two branches in the latent space enhances the restoration capabilities significantly.
Face Refinement Network: A dedicated network is incorporated to refine and recover facial details, which are considered crucial for human perception. This aspect of the pipeline ensures enhanced perceptual quality specifically for portraits, integrating hierarchical spatial adaptive conditions to generate high-resolution face images.

Methodology

VAE-based Latent Space Translation: Two VAEs are trained—one for mapping both synthetic and real photos into a shared latent space, and another for clean images. Translation between these latent spaces using synthetic pairs is key to the model's success, as it learns the restoration process in a domain-aligned feature space.
Partial Nonlocal Block: Integrated into the restoration pipeline, this block specifically addresses inpainting tasks using a global context, essential for restoring structured defects.
Joint Training with Face Network: The face refinement network is trained jointly with the latent translation network to ensure seamless face recovery integrated with the broader restoration process, mitigating artifacts through adaptive condition injection mechanisms.

Results and Implications

The paper demonstrates the superiority of its method through comprehensive experiments, outperforming state-of-the-art restoration techniques and commercial tools in visual quality evaluations. Quantitative benchmarks, including metrics such as PSNR, SSIM, LPIPS, and FID, further underpin the robustness and efficacy of the methodology. Qualitative analyses reveal substantial improvements in the color and sharpness of restored photos, aligning them closely with modern photographic standards.

Implications and Future Work

This research offers meaningful advancements in photo restoration applications, particularly useful for personal heritage preservation and enhancing digital archives. The methodology sets the stage for broader applications in image restoration across various domains that suffer from complex, mixed degradations. Future developments may explore deeper integration with AI models to expand the restoration capabilities, potentially including real-time processing and enrichment of historical video footages. Further studies could also investigate extending the framework to encompass dynamic scenes and higher-dimensional data.

Markdown