Collapsible Linear Blocks for Super-Efficient Super Resolution (2103.09404v4)

Published 17 Mar 2021 in eess.IV, cs.CV, and cs.LG

Abstract: With the advent of smart devices that support 4K and 8K resolution, Single Image Super Resolution (SISR) has become an important computer vision problem. However, most super resolution deep networks are computationally very expensive. In this paper, we propose Super-Efficient Super Resolution (SESR) networks that establish a new state-of-the-art for efficient super resolution. Our approach is based on linear overparameterization of CNNs and creates an efficient model architecture for SISR. With theoretical analysis, we uncover the limitations of existing overparameterization methods and show how the proposed method alleviates them. Detailed experiments across six benchmark datasets demonstrate that SESR achieves similar or better image quality than state-of-the-art models while requiring 2x to 330x fewer Multiply-Accumulate (MAC) operations. As a result, SESR can be used on constrained hardware to perform x2 (1080p to 4K) and x4 (1080p to 8K) SISR. Towards this, we estimate hardware performance numbers for a commercial Arm mobile-Neural Processing Unit (NPU) for 1080p to 4K (x2) and 1080p to 8K (x4) SISR. Our results highlight the challenges faced by super resolution on AI accelerators and demonstrate that SESR is significantly faster (e.g., 6x-8x higher FPS) than existing models on mobile-NPU. Finally, SESR outperforms prior models by 1.5x-2x in latency on Arm CPU and GPU when deployed on a real mobile device. The code for this work is available at https://github.com/ARM-software/sesr.

Authors (9)

Kartikeya Bhardwaj (21 papers)
Milos Milosavljevic (31 papers)
Liam O'Neil (1 paper)
Dibakar Gope (17 papers)
Ramon Matas (4 papers)
Alex Chalfin (2 papers)
Naveen Suda (13 papers)
Lingchuan Meng (6 papers)
Danny Loh (4 papers)

Citations (36)

View on Semantic Scholar

Summary

Analysis of "Collapsible Linear Blocks for Super-Efficient Super Resolution"

The paper presents Super-Efficient Super Resolution (SESR) networks that redefine the efficiency of super-resolution tasks by advancing linear overparameterization within Convolutional Neural Networks (CNNs). The SESR networks outperform existing models in terms of computational efficiency and image quality for Single Image Super Resolution (SISR) tasks, notably achieving the same or higher quality with 2x to 330x fewer Multiply-Accumulate (MAC) operations. This enhancement positions SESR as a potent solution on hardware with stringent resource constraints such as smart devices supporting resolutions up to 8K.

Key Contributions and Methodology

Collapsible Linear Blocks: The core innovation lies in the utilization of collapsible linear blocks, which introduce sequences of overparameterized linear convolutional layers. These layers are analytically reduced to single narrow-width convolutions during inference, thus preserving training benefits while minimizing computational overhead at runtime.
Theoretical Findings: A comparative analysis underscores limitations in existing overparameterization techniques, such as RepVGG, which lacks adaptability for shallow networks, aligning its gradient updates with VGG networks. SESR's methodology, incorporating identity connections to enhance gradient flow, thus addresses broader network trainability challenges, preventing vanishing gradient issues prevalent in deeper structures.
Empirical Evaluation Across Datasets: The SESR framework's performance was validated across six benchmark datasets, delivering similar or superior Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) metrics relative to state-of-the-art models, yet utilizing notably fewer computational resources. These results confirm SESR's efficacy in achieving real-world super-resolution capabilities over diverse datasets like Set5, Set14, and BSD100.
Hardware Realization and Performance Indices: SESR realizes substantial hardware gains, tested on Arm's Ethos-N78 mobile Neural Processing Unit (NPU), demonstrating an operational performance increase by a factor of 6x to 8x over existing models for vital tasks, such as 1080p to 4K upscaling. Moreover, incorporating SESR within Neural Architecture Search (NAS) frameworks afforded additional optimization potential, enhancing performance without sacrificing quality.

Practical Implications and Future Directions

SESR networks are strategically poised to expand the deployment of high-resolution processing on constrained hardware, embodying substantial implications for real-time multimedia applications, mobile devices, and embedded systems. Given SESR's demonstrated strength in computational efficiency and quality retention, future explorations might leverage its design principles to address not only image but also video resolution enhancements, potentially integrating with real-time video streaming services.

Potential advancements in SESR could also involve extending NAS techniques to further modify its internal structure dynamically, responsive to constraints such as varying source resolutions or processing power. Additionally, embracing even-sized and asymmetric kernel adaptations could see SESR's applicability extend into diverse machine learning domains where tailored convolutional operations are beneficial.

Overall, SESR represents a significant advance in efficient super-resolution methodologies, effectively balancing the dual imperatives of high-quality output with minimization in operational footprint, ultimately catalyzing the integration of advanced AI-driven imaging solutions within consumer and industry applications alike.

PDF Markdown

Related Papers

GitHub

GitHub - ARM-software/sesr: Super-Efficient Super Resolution (88 stars)