Papers
Topics
Authors
Recent
Search
2000 character limit reached

Coordinates Encoding Module (OPE)

Updated 10 March 2026
  • Coordinates Encoding Module is a method that maps spatial coordinates to high-dimensional feature vectors using truncated, orthogonal Fourier bases, ensuring precise reconstruction of band-limited image patches.
  • It integrates local latent codes from a convolutional encoder with analytic OPE vectors, enabling efficient, resolution-agnostic upsampling without additional trainable parameters.
  • This parameter-free, interpretable approach achieves performance comparable to state-of-the-art methods, offering significant computational and memory efficiency for continuous image synthesis.

A Coordinates Encoding Module (CEM), as instantiated by Orthogonal Position Encoding (OPE) in the OPE-SR framework, provides an analytical, parameter-free mapping from continuous spatial coordinates to high-dimensional feature vectors for continuous image reconstruction, particularly in arbitrary-scale image super-resolution tasks. OPE leverages orthogonal 2D basis functions—truncated real-valued Fourier bases—allowing for precise, lossless reconstruction of band-limited image patches. The upsampling module operates without trainable weights, instead combining local encoder-produced latent codes linearly with coordinate-dependent OPE vectors, enabling efficient, resolution-agnostic image synthesis (Song et al., 2023).

1. Mathematical Foundation of Orthogonal Position Encoding

Orthogonal Position Encoding constructs a Fourier-type orthonormal basis on the domain [1,1]2[-1,1]^2. For a maximum frequency nn, the one-dimensional positional encoding is defined as

γ(x)=[1, 2cos(πx), 2sin(πx), 2cos(2πx), 2sin(2πx), , 2cos(nπx), 2sin(nπx)]R2n+1\gamma(x) = [ 1,\ \sqrt{2}\cos(\pi x),\ \sqrt{2}\sin(\pi x),\ \sqrt{2}\cos(2\pi x),\ \sqrt{2}\sin(2\pi x),\ \ldots,\ \sqrt{2}\cos(n\pi x),\ \sqrt{2}\sin(n\pi x) ] \in \mathbb{R}^{2n+1}

The two-dimensional basis functions for an image patch f(x,y)f(x,y) are derived from the outer product of the one-dimensional encodings:

ei,j(x,y)=γi(x)γj(y)e_{i,j}(x,y) = \gamma_i(x)\gamma_j(y)

where i,j=0,,2ni,j=0,\ldots,2n. These basis functions are orthonormal with respect to the L2([1,1]2)L^{2}([{-1},1]^2) inner product:

g,h=141111g(x,y)h(x,y) dxdy\langle g,h \rangle = \frac{1}{4} \int_{-1}^1 \int_{-1}^1 g(x,y) h(x,y)\ dx\,dy

and

ei1,j1,ei2,j2=δi1,i2δj1,j2\langle e_{i_1, j_1}, e_{i_2, j_2} \rangle = \delta_{i_1,i_2}\delta_{j_1,j_2}

ensuring the basis spans band-limited 2D functions up to frequency nn per axis (Song et al., 2023).

2. Mapping Coordinates to OPE Feature Vectors

For each continuous spatial coordinate (x,y)[1,1]2(x, y) \in [-1,1]^2, the OPE mapping proceeds in the following steps:

  • Compute γ(x),γ(y)R2n+1\gamma(x), \gamma(y) \in \mathbb{R}^{2n+1}
  • Form the outer product M=γ(x)Tγ(y)R(2n+1)×(2n+1)M = \gamma(x)^T \gamma(y)\in\mathbb{R}^{(2n+1)\times(2n+1)}
  • Flatten MM into the OPE vector P(x,y)R(2n+1)2P(x,y) \in \mathbb{R}^{(2n+1)^2}

Each entry of P(x,y)P(x, y) is a product of cosines/sines at different frequencies, encoding both low- and high-frequency spatial structure. This direct analytic mapping ensures that any (x,y)(x, y) can be queried at arbitrary resolution, providing a consistent interface irrespective of output image size.

3. Structure and Workflow of the OPE-Upscale (Coordinates Encoding) Module

The OPE-Upscale Module operates as follows:

  • Encoder: A conventional convolutional backbone (e.g., EDSR-base, RDN) processes the input LR image, yielding a feature map FRh×w×CF\in \mathbb{R}^{h\times w\times C}, where C=3(2n+1)2C=3\cdot(2n+1)^2. Each location's feature is interpreted as the latent code zi,jz_{i,j} for a local patch.
  • Upsampling: For each target HR pixel (xq,yq)(x_q, y_q):

    1. Determine its continuous coordinate.
    2. Identify the four nearest encoder locations (it,jt), t=14(i_t, j_t),\ t=1\ldots4.
    3. Compute relative patch coordinates (xq,yq)[1,1]2(x'_q, y'_q)\in[-1,1]^2 for each neighbor, using

    xq=xqxc(it)Δx,yq=yqyc(jt)Δyx'_q = \frac{x_q - x_c(i_t)}{\Delta_x},\quad y'_q = \frac{y_q - y_c(j_t)}{\Delta_y}

    where Δx=2/h, Δy=2/w\Delta_x = 2/h,\ \Delta_y = 2/w. 4. Generate Pt=flatten(γ(xq)Tγ(yq))P_t = \text{flatten}(\gamma(x'_q)^T \gamma(y'_q)). 5. For each color channel c{R,G,B}c\in\{R,G,B\}, reconstruct It,c=zt,cPtI_{t, c} = z_{t,c} \cdot P_t. 6. Aggregate values with bilinear-style weights wtw_t proportional to relative areas:

    I(q)=[twtIt,R, twtIt,G, twtIt,B]I(q) = \Big[\sum_t w_t I_{t, R},\ \sum_t w_t I_{t, G},\ \sum_t w_t I_{t, B}\Big]

This architecture achieves spatial smoothness through patch ensembles and enables continuous, arbitrary-scale super-resolution, as each HR pixel is a deterministic function of (x,y)(x, y), local latent codes, and known basis functions (Song et al., 2023).

4. Parameter-Free, Resolution-Agnostic Upsampling

The Coordinates Encoding Module introduces no trainable parameters in the upsampling or decoding stage. All weights are confined to the encoder EθE_\theta. Once local latent codes are extracted, every output pixel is computed analytically via basis evaluation and linear operations. There is no learned neural decoder (e.g., MLPs typical of implicit neural representations), making the upsampling "parameter-free." Arbitrary output resolutions are possible as (x,y)(x, y) coordinates may be sampled at any density, unconstrained by training-time resolution or grid structure (Song et al., 2023).

5. Orthogonality, Completeness, and Reconstruction Guarantees

The orthonormality of the OPE basis functions guarantees that, for functions band-limited to frequency nn per axis, the truncated reconstruction

f(x,y)=i=02nj=02nZi,jγi(x)γj(y)f(x,y) = \sum_{i=0}^{2n} \sum_{j=0}^{2n} Z_{i,j}\, \gamma_i(x)\gamma_j(y)

with Zi,j=f,γi(x)γj(y)Z_{i,j} = \langle f, \gamma_i(x)\gamma_j(y) \rangle is exact within this subspace. In practice, the encoder's latent code zi,jz_{i,j} stores an approximate block of these coefficients over each grid cell. Upon querying, the OPE basis is evaluated locally at arbitrary spatial locations, extracting the appropriate mixture of coefficients for continuous patch reconstruction. For sufficiently large nn (typically n=3n=3 or $4$), local details are preserved up to the Nyquist frequency of the encoder's output. Increasing nn beyond this range can recover finer structure if encoded, but may introduce ringing artefacts if higher frequencies are not captured by the backbone (Song et al., 2023).

6. Practical Implications, Efficiency, and Interpretability

The OPE-based CEM replaces the traditional MLP decoder commonly used in coordinate-based implicit representations with an entirely analytical, interpretable mechanism. The analytic, orthogonal design assures flip-consistent and efficient decoding, resulting in high computational and memory efficiency relative to state-of-the-art (SOTA) SR models. In experimental evaluations, the OPE-Upscale module achieves results comparable to SOTA methods, while offering a concise SR framework with practical advantages in efficiency and resource consumption (Song et al., 2023). This suggests a broader utility of orthogonal, parameter-free coordinate encoding in continuous signal reconstruction tasks beyond image super-resolution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Coordinates Encoding Module (CEM).