Learned Lightweight Smartphone ISP with Unpaired Data (2505.10420v1)

Published 15 May 2025 in cs.CV and cs.AI

Abstract: The Image Signal Processor (ISP) is a fundamental component in modern smartphone cameras responsible for conversion of RAW sensor image data to RGB images with a strong focus on perceptual quality. Recent work highlights the potential of deep learning approaches and their ability to capture details with a quality increasingly close to that of professional cameras. A difficult and costly step when developing a learned ISP is the acquisition of pixel-wise aligned paired data that maps the raw captured by a smartphone camera sensor to high-quality reference images. In this work, we address this challenge by proposing a novel training method for a learnable ISP that eliminates the need for direct correspondences between raw images and ground-truth data with matching content. Our unpaired approach employs a multi-term loss function guided by adversarial training with multiple discriminators processing feature maps from pre-trained networks to maintain content structure while learning color and texture characteristics from the target RGB dataset. Using lightweight neural network architectures suitable for mobile devices as backbones, we evaluated our method on the Zurich RAW to RGB and Fujifilm UltraISP datasets. Compared to paired training methods, our unpaired learning strategy shows strong potential and achieves high fidelity across multiple evaluation metrics. The code and pre-trained models are available at https://github.com/AndreiiArhire/Learned-Lightweight-Smartphone-ISP-with-Unpaired-Data .

Summary

Learned Lightweight Smartphone ISP with Unpaired Data

The paper "Learned Lightweight Smartphone ISP with Unpaired Data," authored by Andrei Arhire and Radu Timofte, presents a novel approach to developing Image Signal Processors (ISPs) for smartphone cameras using unpaired data. This research addresses the complexity and cost involved in obtaining pixel-wise aligned paired data for training ISPs, which traditionally involves mapping RAW sensor data to high-quality images from professional cameras. The authors introduce an unpaired training method, leveraging deep learning techniques to build neural ISPs capable of running on edge devices without the necessity of paired data.

Key Contributions

Unpaired Training Approach: The paper proposes a method that circumvents the need for paired training data, which is challenging to acquire due to the requirement for pixel-wise alignment. By using unpaired data, the ISP model can be trained without direct correspondence between RAW images and ground-truth RGB images. This is achieved through a multi-term loss function, employing adversarial training with multiple discriminators to preserve content structure while learning color and texture characteristics.
Lightweight Architecture: The authors use lightweight neural network architectures suitable for mobile devices. The Efficient ISP architecture used consists of three convolutional layers followed by a pixel-shuffle layer, enabling fast inference on mobile GPUs while maintaining high fidelity in image processing tasks.
Evaluation: The proposed method was evaluated on the Zurich RAW to RGB and Fujifilm UltraISP datasets. Despite the absence of paired data, the unpaired learning strategy demonstrated strong potential, achieving high fidelity across various metrics, including PSNR, SSIM, MS-SSIM, and LPIPS scores.
Practical Implications: This approach allows for the deployment of advanced ISP models on standard smartphones, bridging the perceptual quality gap between smartphone cameras and professional DSLRs. The unpaired method facilitates adaptation to different camera domains without extensive data acquisition tasks for each new device.

Numerical and Comparative Results

The unpaired training method showcased promising numerical results comparable to methods relying on paired data. Specifically, the PSNR and SSIM scores were close to those achieved with paired training, highlighting the effectiveness of the adversarial and texture-based losses in enhancing perceptual quality. Moreover, the LPIPS score indicates improved perceptual realism when employing multiple discriminators to process feature maps from pre-trained networks.

Implications and Future Directions

The authors' approach provides important implications for developing ISPs in resource-constrained environments. By eliminating the need for paired data, the proposed method offers a cost-effective solution for manufacturers and developers aiming to enhance camera capabilities on smartphones. The methodology is particularly beneficial for producing ISP models adaptable to various sensor configurations and image domains without extensive retraining requirements.

Looking ahead, future research may focus on integrating NILUT for further improvements in color accuracy and tone mapping. Additionally, exploring adaptive hyperparameter selection techniques could enhance the stability and fidelity of models trained with unpaired data, aiming to close the remaining gaps in quality metrics compared to paired data training.

In conclusion, this paper contributes significantly to the field of image signal processing by demonstrating an effective unpaired training method for lightweight ISPs, offering pathways for subsequent advancements in mobile photography and computational imaging technologies.

Related Papers

GitHub

GitHub - AndreiiArhire/Learned-Lightweight-Smartphone-ISP-with-Unpaired-Data: [CVPRW 2025] Learned Lightweight Smartphone ISP with Unpaired Data (PyTorch) (2 stars)