Learning Continuous Image Representation with Local Implicit Image Function (2012.09161v2)

Published 16 Dec 2020 in cs.CV and cs.LG

Abstract: How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths.

Citations (568)

View on Semantic Scholar

Summary

The paper introduces LIIF, a continuous image representation method that predicts RGB values from coordinate queries, achieving state-of-the-art super-resolution performance.
It employs feature unfolding, local ensemble, and cell decoding to enrich local latent codes and ensure smooth, artifact-free transitions across scales.
Empirical results on datasets like DIV2K demonstrate LIIF's robustness by reliably scaling images up to 30x with high fidelity compared to existing methods.

Learning Continuous Image Representation with Local Implicit Image Function

This paper introduces the Local Implicit Image Function (LIIF), a novel approach to representing images in a continuous manner. Traditional methods store images as discrete 2D arrays of pixels, which limits resolution adaptability and fidelity. Inspired by implicit neural representation advancements in 3D reconstruction, LIIF represents an image through a function that maps coordinates to RGB values, allowing arbitrary resolution presentation.

Methodology

LIIF functions by associating each image with a 2D feature map and latent codes distributed across spatial dimensions. Given a query coordinate, LIIF predicts its RGB value using local encoding information. This continuous feature allows for arbitrary resolution rendering, providing a seamless transition between discrete and continuous representations.

The LIIF architecture incorporates several key components:

Feature Unfolding: To enhance local latent code information, a feature unfolding process concatenates neighboring latent codes, enriching the representation.
Local Ensemble: This technique addresses prediction discontinuity by overlapping local code predictions and weighted merging based on proximity, ensuring smooth transitions and reducing artifacts.
Cell Decoding: By considering the query pixel's size, the decoding function integrates additional contextual information, enhancing fidelity regardless of resolution.

A self-supervised super-resolution task trains the LIIF framework. By up-sampling images at random scales within the training phase, the system learns to generalize pixel predictions effectively even at resolutions 30 times higher than the input, without prior exposure to such scales during training.

Empirical Evaluation

The authors conduct extensive experiments using the DIV2K dataset and standard benchmarking sets like Set5, Set14, B100, and Urban100. LIIF demonstrates competitive performance at in-distribution scales (e.g., ×2 to ×4) and exhibits superior quality when extrapolating to out-of-distribution scales up to ×30. Compared to methods such as MetaSR, LIIF consistently yields higher fidelity results, emphasizing its capacity for arbitrary high-resolution generalization.

Qualitative assessments highlight LIIF's ability to maintain fine details at massive up-scaling factors, a challenge for many existing techniques. Further ablation studies underline the contributions of cell decoding and other architectural features, offering insights into optimizing image fidelity across varying resolutions.

Implications and Future Directions

The introduction of LIIF has practical implications, specifically in applications requiring high-fidelity image scaling without predefined constraints, such as medical imaging and large-format displays. Theoretical implications extend to bridging discrete and continuous image representations, potentially influencing the development of more adaptable neural network architectures.

Looking ahead, there is potential to explore enhanced decoding functions and extend LIIF to broader image-to-image translation tasks. Continuous representation offers a promising pathway for adapting existing computer vision models, catering to diverse image resolutions and qualities, ultimately driving further innovations in image processing and analysis methodologies.

PDF Markdown