Knowledge Distillation based Degradation Estimation for Blind Super-Resolution (2211.16928v2)

Published 30 Nov 2022 in eess.IV and cs.CV

Abstract: Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to supervise the degradation estimator training. In addition, these special designs for certain degradation, such as blur, impedes the models from being generalized to handle different degradations. To this end, it is necessary to design an implicit degradation estimator that can extract discriminative degradation representation for all degradations without relying on the supervision of degradation ground-truth. In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network. To learn the KDSR model, we first train a teacher network: KD-IDE${T}$. It takes paired HR and LR patches as inputs and is optimized with the SR network jointly. Then, we further train a student network KD-IDE${S}$, which only takes LR images as input and learns to extract the same implicit degradation representation (IDR) as KD-IDE$_{T}$. In addition, to fully use extracted IDR, we design a simple, strong, and efficient IDR based dynamic convolution residual block (IDR-DCRB) to build an SR network. We conduct extensive experiments under classic and real-world degradation settings. The results show that KDSR achieves SOTA performance and can generalize to various degradation processes. The source codes and pre-trained models will be released.

Authors (7)

Bin Xia (56 papers)
Yulun Zhang (167 papers)
Yitong Wang (47 papers)
Yapeng Tian (80 papers)
Wenming Yang (71 papers)
Radu Timofte (299 papers)
Luc Van Gool (570 papers)

Citations (24)

View on Semantic Scholar

Summary

This paper addresses the challenge of Blind Image Super-Resolution (Blind-SR), where the goal is to restore a high-resolution (HR) image from a low-resolution (LR) image with unknown degradation. Existing Blind-SR methods often rely on explicit degradation estimators trained with ground-truth degradation parameters (like blur kernel or noise level). However, providing concrete labels for the complex combinations of degradations found in real-world images (blur, noise, JPEG compression, random order, second-order effects) is infeasible. Furthermore, explicit estimators designed for specific degradation types struggle to generalize to others.

To overcome these limitations, the authors propose the Knowledge Distillation based Blind-SR network (KDSR). KDSR introduces an implicit degradation estimator that learns to represent degradation without requiring explicit degradation ground truth labels during its final training stage. This implicit representation is then used to guide the super-resolution process.

The KDSR framework consists of two main components:

Knowledge Distillation based Implicit Degradation Estimator (KD-IDE): This network learns to extract a discriminative Implicit Degradation Representation (IDR) from the LR image.
- Teacher Network (KD-IDE $_{T}$ ): This network is trained first, taking paired HR and LR images as input. Since it has access to both the clean HR and degraded LR, it can inherently learn the degradation applied. KD-IDE $_{T}$ is trained jointly with the SR network (see below) using a reconstruction loss. It outputs two IDR vectors: $\mathbf{D}_{T}^{\prime}$ (a richer representation for distillation) and $\mathbf{D}_{T}$ (a compressed version for SR guidance).
- Student Network (KD-IDE $_{S}$ ): This is the final degradation estimator used for inference. It takes only the LR image as input. The student network is trained using knowledge distillation to mimic the IDR output ( $\mathbf{D}_{S}^{\prime}$ to match $\mathbf{D}_{T}^{\prime}$ ) of the teacher network. This allows the student to learn to infer degradation characteristics solely from the LR input, without needing explicit degradation labels. The paper shows that Kullback-Leibler (KL) divergence loss is effective for this distillation ( $\mathcal{L}_{kl}$ , Eq. 4).
Image Super-Resolution Network: This network takes the LR image and the implicit degradation representation ( $\mathbf{D}$ $D$ ) from the KD-IDE as input to reconstruct the HR image. The authors design this network to be simple, strong, and efficient.
- IDR based Dynamic Convolution Residual Block (IDR-DCRB): This is the core building block of the SR network. It leverages the estimated IDR to dynamically adjust its convolution weights.
- IDR based Depthwise Dynamic Convolution (IDR-DDC): To ensure efficiency, the dynamic convolution is implemented using depthwise convolution. The IDR vector $\mathbf{D}$ is fed through linear layers ( $\phi$ ) to generate the depthwise convolution weights $\mathbf{W}$ for each input channel (Eqs. 1 & 2). This reduces computation compared to generating weights for standard convolution. An IDR-DCRB combines this IDR-DDC with a standard convolution to allow for channel interaction.
- Network Structure: The SR network is formed by stacking multiple IDR-DCRBs.

The training process is performed in two stages:

Train the teacher KDSR $_{T}$ (KD-IDE $_{T}$ + SR network) using paired HR/LR images and a reconstruction loss ( $\mathcal{L}_{rec}$ , Eq. 3).
Train the student KDSR $_{S}$ (KD-IDE $_{S}$ + SR network). Initialize with the teacher's weights. The KD-IDE $_{S}$ learns from LR only, guided by the KD loss ( $\mathcal{L}_{kl}$ ) between its output $\mathbf{D}_{S}^{\prime}$ and the teacher's $\mathbf{D}_{T}^{\prime}$ . The SR network is trained using the student's IDR $\mathbf{D}_{S}$ and a combination of reconstruction loss ( $\mathcal{L}_{rec}$ ), and optionally perceptual ( $\mathcal{L}_{per}$ ) and adversarial losses ( $\mathcal{L}_{adv}$ ) for real-world settings ( $\mathcal{L}_{real}$ , Eq. 6), or just reconstruction and KD loss for classic settings ( $\mathcal{L}_{classic}$ , Eq. 5).

Practical Implementation Details and Considerations:

Input for Teacher: For the teacher KD-IDE $_{T}$ , the input is a concatenation of the LR image ( $3 \times H \times W$ ) and a downsampled version of the HR image obtained via Pixel-Unshuffle ( $48 \times H \times W$ for 4x scale), resulting in a $51 \times H \times W$ input. This allows the teacher to observe both input and output to learn the transformation/degradation.
Input for Student: The student KD-IDE $_{S}$ only takes the LR image ( $3 \times H \times W$ ) as input, making it suitable for inference on real-world images without corresponding HR pairs.
IDR Dimensionality: The paper uses two IDR vectors, $\mathbf{D}^{\prime} \in \mathbb{R}^{4C}$ for distillation and $\mathbf{D} \in \mathbb{R}^{C}$ (compressed from $\mathbf{D}^{\prime}$ via a linear layer) for guiding the SR dynamic convolution. This balances distillation effectiveness with computational efficiency in the dynamic convolution.
Dynamic Depthwise Convolution: The weights $\mathbf{W}$ for the depthwise dynamic convolution are generated from $\mathbf{D}$ by two linear layers followed by a reshape operation to match the depthwise kernel dimensions ( $C \times 1 \times K_h \times K_w$ ). This is a key efficiency measure (roughly $1/C$ the computation of standard convolution).
Loss Functions: The choice of loss depends on the target quality. $\mathcal{L}_1$ is used for PSNR-oriented reconstruction (classic Blind-SR), while a combination including perceptual and adversarial losses ( $\mathcal{L}_{real}$ ) is used for better visual quality in real-world scenarios. KL divergence is found to be effective for distilling the IDR distribution.
Training Data: The method is evaluated on synthetic degradations using DIV2K and Flickr2K and on complex real-world-like degradations using DF2K and OutdoorSceneTraining, highlighting its versatility.
Computational Efficiency: KDSR $_{S}$ -M and KDSR $_{S}$ versions demonstrate competitive or superior performance compared to state-of-the-art methods while often having significantly fewer FLOPs and lower runtime (Tables 1, 2, and 3). The IDR-DDC plays a crucial role in this efficiency.

Applications:

KDSR is designed for Blind Super-Resolution, making it applicable in various scenarios where image quality needs improvement but the degradation process is unknown:

Enhancing low-quality images from various sources: Images captured by different cameras, under varying lighting conditions, or transmitted with compression artifacts and noise.
Historical image restoration: Upscaling and improving old or damaged photographs.
Medical imaging: Enhancing resolution of scans potentially affected by acquisition limitations or noise.
Surveillance and remote sensing: Improving detail in images captured under non-ideal conditions.

Limitations and Trade-offs:

Two-stage training: Requires training a teacher network first, which adds complexity to the training pipeline compared to end-to-end single-stage methods.
Teacher dependence: The performance of the student KD-IDE is dependent on how well the teacher KD-IDE $_{T}$ can extract degradation information, which in turn relies on the quality and diversity of the paired HR/LR data used in the first training stage.
Implicit representation: While flexible, the implicit nature of the IDR means it doesn't provide explicit degradation parameters (like kernel shape) that might be useful for other image processing tasks.

The paper demonstrates that learning an implicit degradation representation via knowledge distillation, coupled with an efficient SR network utilizing dynamic depthwise convolution guided by this representation, achieves state-of-the-art performance in both classic and complex real-world Blind-SR settings while maintaining competitive efficiency. The visualization of IDR using t-SNE further supports that the trained KD-IDE can effectively distinguish between different degradation types.

PDF Markdown

Knowledge Distillation based Degradation Estimation for Blind Super-Resolution (2211.16928v2)

Summary

Related Papers