- The paper introduces LIIF, a continuous image representation method that predicts RGB values from coordinate queries, achieving state-of-the-art super-resolution performance.
- It employs feature unfolding, local ensemble, and cell decoding to enrich local latent codes and ensure smooth, artifact-free transitions across scales.
- Empirical results on datasets like DIV2K demonstrate LIIF's robustness by reliably scaling images up to 30x with high fidelity compared to existing methods.
Learning Continuous Image Representation with Local Implicit Image Function
This paper introduces the Local Implicit Image Function (LIIF), a novel approach to representing images in a continuous manner. Traditional methods store images as discrete 2D arrays of pixels, which limits resolution adaptability and fidelity. Inspired by implicit neural representation advancements in 3D reconstruction, LIIF represents an image through a function that maps coordinates to RGB values, allowing arbitrary resolution presentation.
Methodology
LIIF functions by associating each image with a 2D feature map and latent codes distributed across spatial dimensions. Given a query coordinate, LIIF predicts its RGB value using local encoding information. This continuous feature allows for arbitrary resolution rendering, providing a seamless transition between discrete and continuous representations.
The LIIF architecture incorporates several key components:
- Feature Unfolding: To enhance local latent code information, a feature unfolding process concatenates neighboring latent codes, enriching the representation.
- Local Ensemble: This technique addresses prediction discontinuity by overlapping local code predictions and weighted merging based on proximity, ensuring smooth transitions and reducing artifacts.
- Cell Decoding: By considering the query pixel's size, the decoding function integrates additional contextual information, enhancing fidelity regardless of resolution.
A self-supervised super-resolution task trains the LIIF framework. By up-sampling images at random scales within the training phase, the system learns to generalize pixel predictions effectively even at resolutions 30 times higher than the input, without prior exposure to such scales during training.
Empirical Evaluation
The authors conduct extensive experiments using the DIV2K dataset and standard benchmarking sets like Set5, Set14, B100, and Urban100. LIIF demonstrates competitive performance at in-distribution scales (e.g., ×2 to ×4) and exhibits superior quality when extrapolating to out-of-distribution scales up to ×30. Compared to methods such as MetaSR, LIIF consistently yields higher fidelity results, emphasizing its capacity for arbitrary high-resolution generalization.
Qualitative assessments highlight LIIF's ability to maintain fine details at massive up-scaling factors, a challenge for many existing techniques. Further ablation studies underline the contributions of cell decoding and other architectural features, offering insights into optimizing image fidelity across varying resolutions.
Implications and Future Directions
The introduction of LIIF has practical implications, specifically in applications requiring high-fidelity image scaling without predefined constraints, such as medical imaging and large-format displays. Theoretical implications extend to bridging discrete and continuous image representations, potentially influencing the development of more adaptable neural network architectures.
Looking ahead, there is potential to explore enhanced decoding functions and extend LIIF to broader image-to-image translation tasks. Continuous representation offers a promising pathway for adapting existing computer vision models, catering to diverse image resolutions and qualities, ultimately driving further innovations in image processing and analysis methodologies.