Coordinates Encoding Module (OPE)
- Coordinates Encoding Module is a method that maps spatial coordinates to high-dimensional feature vectors using truncated, orthogonal Fourier bases, ensuring precise reconstruction of band-limited image patches.
- It integrates local latent codes from a convolutional encoder with analytic OPE vectors, enabling efficient, resolution-agnostic upsampling without additional trainable parameters.
- This parameter-free, interpretable approach achieves performance comparable to state-of-the-art methods, offering significant computational and memory efficiency for continuous image synthesis.
A Coordinates Encoding Module (CEM), as instantiated by Orthogonal Position Encoding (OPE) in the OPE-SR framework, provides an analytical, parameter-free mapping from continuous spatial coordinates to high-dimensional feature vectors for continuous image reconstruction, particularly in arbitrary-scale image super-resolution tasks. OPE leverages orthogonal 2D basis functions—truncated real-valued Fourier bases—allowing for precise, lossless reconstruction of band-limited image patches. The upsampling module operates without trainable weights, instead combining local encoder-produced latent codes linearly with coordinate-dependent OPE vectors, enabling efficient, resolution-agnostic image synthesis (Song et al., 2023).
1. Mathematical Foundation of Orthogonal Position Encoding
Orthogonal Position Encoding constructs a Fourier-type orthonormal basis on the domain . For a maximum frequency , the one-dimensional positional encoding is defined as
The two-dimensional basis functions for an image patch are derived from the outer product of the one-dimensional encodings:
where . These basis functions are orthonormal with respect to the inner product:
and
ensuring the basis spans band-limited 2D functions up to frequency per axis (Song et al., 2023).
2. Mapping Coordinates to OPE Feature Vectors
For each continuous spatial coordinate , the OPE mapping proceeds in the following steps:
- Compute
- Form the outer product
- Flatten into the OPE vector
Each entry of is a product of cosines/sines at different frequencies, encoding both low- and high-frequency spatial structure. This direct analytic mapping ensures that any can be queried at arbitrary resolution, providing a consistent interface irrespective of output image size.
3. Structure and Workflow of the OPE-Upscale (Coordinates Encoding) Module
The OPE-Upscale Module operates as follows:
- Encoder: A conventional convolutional backbone (e.g., EDSR-base, RDN) processes the input LR image, yielding a feature map , where . Each location's feature is interpreted as the latent code for a local patch.
- Upsampling: For each target HR pixel :
- Determine its continuous coordinate.
- Identify the four nearest encoder locations .
- Compute relative patch coordinates for each neighbor, using
where . 4. Generate . 5. For each color channel , reconstruct . 6. Aggregate values with bilinear-style weights proportional to relative areas:
This architecture achieves spatial smoothness through patch ensembles and enables continuous, arbitrary-scale super-resolution, as each HR pixel is a deterministic function of , local latent codes, and known basis functions (Song et al., 2023).
4. Parameter-Free, Resolution-Agnostic Upsampling
The Coordinates Encoding Module introduces no trainable parameters in the upsampling or decoding stage. All weights are confined to the encoder . Once local latent codes are extracted, every output pixel is computed analytically via basis evaluation and linear operations. There is no learned neural decoder (e.g., MLPs typical of implicit neural representations), making the upsampling "parameter-free." Arbitrary output resolutions are possible as coordinates may be sampled at any density, unconstrained by training-time resolution or grid structure (Song et al., 2023).
5. Orthogonality, Completeness, and Reconstruction Guarantees
The orthonormality of the OPE basis functions guarantees that, for functions band-limited to frequency per axis, the truncated reconstruction
with is exact within this subspace. In practice, the encoder's latent code stores an approximate block of these coefficients over each grid cell. Upon querying, the OPE basis is evaluated locally at arbitrary spatial locations, extracting the appropriate mixture of coefficients for continuous patch reconstruction. For sufficiently large (typically or $4$), local details are preserved up to the Nyquist frequency of the encoder's output. Increasing beyond this range can recover finer structure if encoded, but may introduce ringing artefacts if higher frequencies are not captured by the backbone (Song et al., 2023).
6. Practical Implications, Efficiency, and Interpretability
The OPE-based CEM replaces the traditional MLP decoder commonly used in coordinate-based implicit representations with an entirely analytical, interpretable mechanism. The analytic, orthogonal design assures flip-consistent and efficient decoding, resulting in high computational and memory efficiency relative to state-of-the-art (SOTA) SR models. In experimental evaluations, the OPE-Upscale module achieves results comparable to SOTA methods, while offering a concise SR framework with practical advantages in efficiency and resource consumption (Song et al., 2023). This suggests a broader utility of orthogonal, parameter-free coordinate encoding in continuous signal reconstruction tasks beyond image super-resolution.