2000 character limit reached

Feature-Adaptive Implicit Neural Representations

Updated 17 November 2025

FA-INR is a neural architecture that integrates data-driven feature adaptivity into implicit neural representations through dynamic activation modulation and memory-based feature retrieval.
It employs techniques such as Incode-style adaptive sine activations and cross-attention with Mixture-of-Experts routing to leverage local data complexity for optimal reconstruction fidelity.
Empirical results demonstrate state-of-the-art performance in PSNR, SSIM, and IoU across audio, imaging, and scientific simulations, albeit with increased computational overhead.

Feature-Adaptive Implicit Neural Representation (FA-INR) refers to a class of neural architectures in which the implicit mapping of coordinates (and potentially auxiliary parameters) to signal values is dynamically conditioned on intermediate or global features, enabling model capacity to adapt flexibly to local data complexity. FA-INR methods depart from conventional implicit neural representations (INRs)—where parameters and activation functions are fixed—by introducing explicit mechanisms for feature-driven modulation of either neural activations or feature retrieval, often resulting in superior reconstruction fidelity and parameter efficiency across modalities such as audio, imaging, and scientific simulation.

1. Definitional Scope and Taxonomy

FA-INR encompasses architectures that, during inference, modulate either their intermediate activations or their internal feature representations based on data-adaptive context. According to the taxonomy of INRs (Essakine et al., 6 Nov 2024), feature adaptivity is realized either by predicting the parameters of activation functions using internal statistics (“Incode” style) or by using additional adaptive mechanisms such as cross-attention over feature memory banks governed by Mixture-of-Experts (MoE) routing (Li et al., 7 Jun 2025). Distinct from basic INRs employing static positional encodings or fixed nonlinearities, FA-INR approaches incorporate feature-driven conditioning at key model stages, either intra-layer (activation modulation) or as part of hierarchical representation retrieval (memory attention + routing).

2. Core Methodologies and Architectures

2.1 Activation Modulation: Incode-style Feature Adaptivity

In the “Incode” approach (Essakine et al., 6 Nov 2024), each layer’s activation function is not static but receives parameters from a learned, feature-processing network (“harmoniser”), whose input is a summary of the previous layer’s activations. For coordinate $x \in \mathbb{R}^d$ , and hidden activations $y_{i-1}$ , layer $i$ computes:

$f_{\text{Incode}}: \qquad y_i = \sigma_i(W_i y_{i-1} + b_i)$

$\sigma_i(u) = a_i \sin(b_i \omega u + c_i) + d_i$

where $(a_i, b_i, c_i, d_i) = H_i(f_i)$ and $f_i$ is a pooled summary (e.g., global average) of $y_{i-1}$ . $H_i$ is typically a compact MLP, outputting per-layer, per-sample sine amplitude, frequency, phase, and bias. All components are jointly trained to minimise a suitable reconstruction loss (e.g., $L_2$ ).

2.2 Memory-Augmented Adaptive Representations

In scientific surrogate modeling (Li et al., 7 Jun 2025), FA-INR augments the standard INR mapping $f_\theta(x, p)$ with a learnable key–value memory bank and cross-attention feature retrieval, optionally routed via a coordinate-driven MoE:

Spatial Encoder: $x \mapsto z^{(x)} \in \mathbb{R}^{D_z}$ via an MLP.
Memory Query: $q = z^{(x)} W_q \in \mathbb{R}^{D_k}$ .
Memory Bank: $K \in \mathbb{R}^{M\times D_k}$ , $V \in \mathbb{R}^{M\times D_v}$ ; both learned.
Parameter Conditioning: Simulation parameters $p$ are embedded, merged (via elementwise product), and used to adapt $V$ by a residual adapter MLP.
Cross-Attention:

$\alpha = \mathrm{softmax}\left( \frac{q K'{}^{\!\top}}{\sqrt{D_k}} \right) \in \mathbb{R}^M,\qquad z^{(x,p)} = \alpha V'$
Mixture-of-Experts Routing: A low-resolution spatial grid and gating MLP produce expert probabilities; for each $x$ , Top-2 experts are selected and their outputs aggregated.

This design allows the system to allocate model capacity adaptively, focusing on spatial regions or parameter subspaces that exhibit higher local complexity.

3. Algorithmic Descriptions and Key Formulas

3.1 Incode: Feature-Adaptive Sine Activation

For each hidden layer $i$ :

fi = Pool(yi-1)
ai, bi, ci, di = Hi(fi)
ui = Wi @ yi-1 + bi
yi = ai * sin(bi * ω * ui + ci) + di

\omega

is a global frequency hyperparameter.

H_i

is implemented as a small MLP. All parameters are end-to-end learned.

3.2 FA-INR with Cross-Attention and MoE

Given $x$ , $p$ , and a set of $E$ experts, the feature retrieval steps are:

Encode $x \rightarrow z^{(x)} \rightarrow q$ (as above).
Embed $p$ and adapt $V$ (value adapter) via a 2-layer MLP.
For each of the Top-2 selected experts (from the gating network), perform projected cross-attention to obtain $z_k^{(x,p)}$ .
Aggregate feature vectors $\bar z^{(x,p)} = \sum_{k \in \mathrm{Top2}(x)} g_k(x) z_k^{(x,p)}$ .
Decode to output value $\hat y = f_{\theta_D}(\bar z^{(x,p)})$ (3-layer ReLU MLP).

4. Empirical Performance and Trade-offs

4.1 Incode (Feature-Adaptive Sine Activation Network)

1D Audio Reconstruction: Second-lowest $L_2$ error; sharper reconstructions versus non-adaptive baselines at equal iteration counts.
2D CT Reconstruction: Best PSNR and SSIM at all projection counts (e.g., $20$ projections: $24.38$ dB, $0.651$; $300$ projections: $34.76$ dB, $0.953$); only surpassed by Fr at very low sampling rates.
Image Denoising: Highest PSNR ($29.63$ dB) but longest runtime ($2,370$ s); alternatives trade slight PSNR for significant speedup.
Super-Resolution: Leading for $2\times$ upsampling (PSNR $29.56$ dB, SSIM $0.896$, LPIPS $0.176$); nearly best for $4\times$ ; outperformed by Fr/Finer at extreme scales.
3D Occupancy IoU: Near-best performance ($0.99564$), only Finer is marginally higher.

Trade-offs: Significant computational overhead due to harmoniser networks; elevated risk of overfitting from per-layer activation freedom; lack of thorough ablations on harmoniser depth or pooling.

4.2 FA-INR with Cross-Attention and MoE (Scientific Surrogates)

MPAS-Ocean dataset ($1.10$M params, $10$ experts): PSNR $51.92$ dB, SSIM $0.9934$, MD $0.1536$; outperforms grid-based and MLP-based methods by wide margins in both accuracy and parameter efficiency.
Scaling: Increasing number of experts from $1 \rightarrow 10$ leads to PSNR $46.92 \rightarrow 51.92$ dB; diminishing returns beyond $E = 10$ . Model size remains $\sim1-2$ M parameters, versus $5-40$M for grid-based models.
Ablations: Memory bank + MoE architecture yields $>10$ dB gain vs. rigid grids/planes at equal parameter count; Top-2 routing outperforms dense or concatenative aggregation.
Efficiency: FA-INR trains $20-30\%$ slower than grid/plane-INRs but at much greater data-efficiency; much faster than running original scientific simulators.

5. Comparative Analysis and Best Practices

FA-INR architectures provide superior adaptivity compared to traditional INR variants reliant on fixed grids, positional encodings, or static activation functions. Key implementation practices for state-of-the-art performance include:

Component	Best Practices	Typical Settings
Memory bank size	$M = 256$ ( $\lesssim 10^7$ points); $1024$ ( $512^3$ )	$D_k = D_v = 64$
Encoder/Decoder	Encoder: 1–4 layers, sine activation; Decoder: 3 layers, ReLU	$D_z = 128$
Routing	Top-2 MoE routing, grid resolution $16^3$	-
Optimizer	Adam, initial lr $10^{-4}$ (grid models); $10^{-6}$ (MLP baselines)	-

Rigid structural assumptions (such as feature grids or planes) are supplanted by data-adaptive interpolation and routing, yielding a new Pareto frontier of accuracy versus parameter cost. Explicit parameter conditioning adapters are crucial: ablating them reduces PSNR by $\sim0.7$ dB.

6. Limitations and Future Research Directions

Despite state-of-the-art results, FA-INR architectures entail several limitations:

Computational cost: Harmoniser networks and cross-attention introduce overhead, increasing training and inference latency; Incode and memory-augmented MoE FA-INRs are the slowest candidates among current techniques.
Overfitting: Greater flexibility (adaptive sine parameters or adaptive memory routing) introduces additional degrees of freedom, raising overfitting risk, particularly on small or noisy tasks.
Ablation uncertainty: There is no consensus on optimal harmoniser or gating grid architecture; further ablations are required to explore trade-offs among expressivity, overhead, and regularization.

Proposed improvement directions include:

Streamlining harmonisers (parameter sharing, lightweight attention mechanisms);
Hybrid models integrating positional encodings and adaptive fusion;
Regularization on adaptive parameters (e.g., $(a_i, b_i, c_i, d_i)$ or attention outputs);
Dynamic bandwidth scheduling for scale-adaptive allocation of model capacity.

A plausible implication is that future FA-INR architectures will combine the compactness of memory-augmented cross-attention, the spectral flexibility of explicit activation modulation, and scalable, regularized routing strategies to further advance the resolution/efficiency trade-off in implicit neural representations.

PDF Markdown Chat (Pro)

References (2)

Where Do We Stand with Implicit Neural Representations? A Technical and Performance Survey (2024)

High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations (2025)

Follow Topic

Get notified by email when new papers are published related to Feature-Adaptive Implicit Neural Representation (FA-INR).