Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Perceptual Image Super Resolution: AIM 2025 Study and Benchmark

Published 14 Oct 2025 in cs.CV | (2510.12765v1)

Abstract: This paper presents a comprehensive study and benchmark on Efficient Perceptual Super-Resolution (EPSR). While significant progress has been made in efficient PSNR-oriented super resolution, approaches focusing on perceptual quality metrics remain relatively inefficient. Motivated by this gap, we aim to replicate or improve the perceptual results of Real-ESRGAN while meeting strict efficiency constraints: a maximum of 5M parameters and 2000 GFLOPs, calculated for an input size of 960x540 pixels. The proposed solutions were evaluated on a novel dataset consisting of 500 test images of 4K resolution, each degraded using multiple degradation types, without providing the original high-quality counterparts. This design aims to reflect realistic deployment conditions and serves as a diverse and challenging benchmark. The top-performing approach manages to outperform Real-ESRGAN across all benchmark datasets, demonstrating the potential of efficient methods in the perceptual domain. This paper establishes the modern baselines for efficient perceptual super resolution.

Summary

  • The paper presents efficient perceptual super-resolution methods that balance high perceptual quality with strict resource constraints (≤5M parameters and ≤2000 GFLOPs).
  • It demonstrates that models like VPEG and MiAlgo achieve significant improvements in perceptual metrics while reducing parameter count and execution time compared to Real-ESRGAN.
  • The study introduces the PSR4K dataset to robustly test super-resolution performance on 4K images, highlighting challenges like artifacts and the need for improved perceptual metrics.

Efficient Perceptual Image Super Resolution: AIM 2025 Study and Benchmark

Introduction and Motivation

This paper presents a comprehensive benchmark and analysis of Efficient Perceptual Super-Resolution (EPSR) methods, focusing on the intersection of perceptual quality and computational efficiency. While PSNR-oriented super-resolution models have achieved significant efficiency gains, perceptual SR approaches—optimized for metrics such as PI, CLIPIQA, and MANIQA—remain relatively inefficient and underexplored for deployment on resource-constrained platforms. The study addresses this gap by establishing strict efficiency constraints (≤5M parameters, ≤2000 GFLOPs for 960×540960 \times 540 inputs) and evaluating solutions on a novel, challenging 4K test set (PSR4K) with diverse real-world degradations.

Benchmark Design and Datasets

The benchmark leverages both established and novel datasets to rigorously evaluate perceptual SR models:

  • Training datasets: DIV2K, Flickr2K, LSDIR, and OST, with flexible degradation pipelines based on Real-ESRGAN.
  • Testing datasets: The newly introduced PSR4K (500 LR images, 10 semantic categories, 5 degradations per category), alongside PIPAL, DIV2K-LSDIR, RealSR, RealSRSet, and Real47.

The PSR4K dataset is specifically designed to reflect realistic deployment conditions, with high-resolution outputs and complex, undisclosed degradations. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: Example HR and LR images from the PSR4K dataset, illustrating the diversity and scale of the test set.

Evaluation Metrics and Efficiency Constraints

The evaluation aggregates multiple perceptual metrics into a single score relative to the Real-ESRGAN baseline. Metrics include:

  • Perceptual Index (PI): Lower is better.
  • CLIPIQA: Higher is better.
  • MANIQA: Higher is better.

Scores are computed using exponential scaling relative to the baseline, with weights λPI=0.5\lambda_{PI}=0.5, λCLIPIQA=0.25\lambda_{CLIPIQA}=0.25, λMANIQA=0.25\lambda_{MANIQA}=0.25. All models must satisfy the efficiency constraints, but memory and inference time are not directly restricted.

Experimental Results

Overall Performance

VPEG achieves the highest perceptual quality across all metrics, outperforming Real-ESRGAN with only ~19% of its parameters and ~17.6% of its FLOPs. MiAlgo ranks second, with similar efficiency and perceptual improvements. IPIU (EFDN) demonstrates extreme efficiency but lower perceptual quality, highlighting the trade-off between distortion-oriented and perceptual optimization.

Cross-Dataset Generalization

VPEG and MiAlgo consistently outperform Real-ESRGAN and BSRGAN on perceptual metrics across all benchmarks, with the largest improvements observed on PIPAL and RealSR datasets. PSNR-oriented methods (SPAN, R2NET, EFDN) excel only on bicubic-degraded datasets (DIV2K-LSDIR), but fail to deliver perceptual gains on more challenging benchmarks.

Runtime Analysis

VPEG demonstrates substantial runtime efficiency, requiring less than half the execution time of Real-ESRGAN on most datasets (except RealSR), further validating its suitability for real-time and edge deployment.

Per-Class Analysis

Class-wise evaluation on PSR4K reveals that architecture, animals, and nature categories yield the best perceptual scores, likely due to their prevalence in training data. Food, sports, and urban scenes are more challenging, with food images consistently underperforming due to complex textures and under-representation. VPEG exhibits the lowest standard deviation across classes, indicating superior robustness to content variability.

Qualitative Comparison

Visual inspection confirms that perceptual methods (VPEG, MiAlgo, Real-ESRGAN, BSRGAN) produce clear improvements over bicubic upsampling and PSNR-oriented baselines. However, current perceptual metrics fail to penalize hallucinations and artifacts, as evidenced by VPEG and MiAlgo introducing visible artifacts in some RealSR and Real47 samples without corresponding metric degradation. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Qualitative comparison of super-resolution results across multiple datasets, ordered by increasing perceptual quality.

Methodological Innovations

VPEG (SAFMN-L)

VPEG adapts the SAFMN architecture, reducing channel dimensions to meet efficiency constraints. The model incorporates a multi-stage training regime with L1, FFT-based, perceptual, LDL, GAN, and AESOP losses. The use of a Spectral UNet discriminator and EMA further stabilizes training. No pre-trained SR weights are used, but AESOP pre-trained autoencoder is leveraged for loss computation.

MiAlgo (TinyESRGAN)

MiAlgo introduces TinyESRGAN, a lightweight ESRGAN variant with reduced RRDB count and channel dimensions, achieving a 79% reduction in computational cost. Training employs MSE, LPIPS, and GAN losses in a multi-stage strategy, with realistic degradations generated via the Real-ESRGAN pipeline.

IPIU (EFDN)

IPIU utilizes EFDN, featuring Edge-Enhanced Diverse Branch Blocks (EDBB) for efficient edge and texture extraction. The model is trained with L1 loss and aggressive data augmentation, achieving extreme efficiency but lower perceptual quality.

Implications and Future Directions

The study demonstrates that efficient perceptual SR is feasible, with VPEG and MiAlgo establishing new baselines for the field. However, the persistence of artifacts and hallucinations in top-performing models highlights the limitations of current perceptual metrics and the need for more robust evaluation frameworks. The absence of techniques such as knowledge distillation, advanced re-parameterization, and pruning in submitted solutions suggests untapped potential for further efficiency gains.

The introduction of PSR4K enables fine-grained analysis and benchmarking at 4K resolution, setting a new standard for future research. The efficiency-perception trade-off observed in this study warrants deeper investigation, particularly in the context of deployment on mobile and edge devices.

Conclusion

This benchmark establishes that substantial improvements in perceptual image quality are achievable under strict efficiency constraints, with VPEG and MiAlgo outperforming established baselines. The results underscore the necessity for new perceptual metrics capable of penalizing artifacts and hallucinations, and point to promising directions for future research in efficient, deployable perceptual super-resolution. The PSR4K dataset and evaluation protocol provide a robust foundation for continued progress in this domain.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.