Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution (2505.00334v2)

Published 1 May 2025 in cs.CV and cs.LG

Abstract: Image Super-Resolution is a fundamental problem in computer vision with broad applications spacing from medical imaging to satellite analysis. The ability to reconstruct high-resolution images from low-resolution inputs is crucial for enhancing downstream tasks such as object detection and segmentation. While deep learning has significantly advanced SR, achieving high-quality reconstructions with fine-grained details and realistic textures remains challenging, particularly at high upscaling factors. Recent approaches leveraging diffusion models have demonstrated promising results, yet they often struggle to balance perceptual quality with structural fidelity. In this work, we introduce ResQu a novel SR framework that integrates a quaternion wavelet preprocessing framework with latent diffusion models, incorporating a new quaternion wavelet- and time-aware encoder. Unlike prior methods that simply apply wavelet transforms within diffusion models, our approach enhances the conditioning process by exploiting quaternion wavelet embeddings, which are dynamically integrated at different stages of denoising. Furthermore, we also leverage the generative priors of foundation models such as Stable Diffusion. Extensive experiments on domain-specific datasets demonstrate that our method achieves outstanding SR results, outperforming in many cases existing approaches in perceptual quality and standard evaluation metrics. The code will be available after the revision process.

Summary

Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution

The paper introduces "ResQu," an innovative framework that enhances image super-resolution (SR) by integrating quaternion wavelet preprocessing with latent diffusion models. The primary focus of this research lies in leveraging the Quaternion Wavelet Transform (QWT) combined with the generative power of foundation models like Stable Diffusion (SD), providing notable improvements in reconstruction fidelity, textural realism, and computational efficiency.

Methodology

ResQu introduces a unique combination of quaternion wavelet embeddings and temporal conditioning to improve the denoising process during super-resolution. The Quaternion Wavelet Transform allows for the decomposition of images into multidimensional representations, capturing both low- and high-frequency features crucial for high-quality SR. These quaternion wavelet embeddings, when dynamically integrated throughout the denoising stages, permit enhanced preservation of fine-grained image details and structural fidelity.

The paper employs a latent diffusion model, optimized by the generative priors of Stable Diffusion, to navigate the inherent complexities of the SR task. The quaternion wavelet features are synthesized with temporal information using a novel encoder. This encoder adjusts its influence on the model's outputs based on signal-to-noise ratio (SNR), thus balancing structural fidelity and textural realism.

Experimental Validation

Experiments conducted on benchmark datasets, including DIV2K, RealSR, and DRealSR, reveal ResQu's superiority over existing methods regarding quantitative metrics such as PSNR and SSIM, achieving notable improvements in perceptual metrics like LPIPS and FID. Furthermore, qualitative assessments confirm ResQu's capability to retain intricate details and realistic textures in the generated images.

ResQu performs effectively not only on traditional SR benchmarks but also demonstrates robustness in zero-shot domains such as the ShipSpotting dataset, thereby validating its adaptability to unseen, domain-specific contexts without retraining. Such robustness highlights ResQu's rigor in maintaining competitive performance across diverse real-world scenarios, demonstrating its practical applicability.

Contributions and Implications

ResQu's integration of quaternion wavelet conditioning in diffusion models presents a significant advancement in generative model architectures for SR. The potential for such a framework to scale beyond traditional image domains is immense, paving the way for applications in fields requiring precise detail recovery, such as medical imaging and satellite analysis.

ResQu also speculates potential paths for future research. This could involve extending the framework's capabilities by exploring further interactions between quaternion-based preprocessing and advanced generative approaches, such as text-to-image diffusion models, emphasizing a broader application scope.

Conclusion

ResQu introduces a compelling foundation for improving image super-resolution through the innovative use of quaternion wavelets. This approach has achieved superior results with far-reaching implications in real-world applications requiring high-quality detail restoration. Researchers and practitioners may find the integration of wavelet transforms with diffusion models, as exemplified in ResQu, a promising direction for future exploration in AI image processing technology.

Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution (2505.00334v2)

Summary