Soft-Position Embedding in Coordinate-MLPs
- Soft-position embedding is a learnable, smooth positional encoding that maps input coordinates using super-Gaussian functions with instance-specific scale parameters.
- It employs a graph-Laplacian smoothness prior to balance fidelity to local details with overall generalization, outperforming traditional random Fourier features.
- A two-stage optimization decouples the embedding hyperparameter fitting from MLP training, ensuring improved training stability and high-fidelity signal reconstruction.
Soft-position embedding is a learnable, smooth positional encoding scheme tailored for coordinate-MLPs, in which each input coordinate is mapped to a high-dimensional space via a parameterized, instance-specific transformation. This framework enables each coordinate to have its own local embedding bandwidth, governed by per-coordinate scale parameters. These scales are optimized using a graph-Laplacian smoothness prior, carefully balancing fidelity to complex local detail with generalization, and providing high stability in both training and inference. The methodology outperforms classical position encodings such as random Fourier features (RFF), particularly in high-fidelity signal and image regression and neural rendering tasks (Ramasinghe et al., 2021).
1. Super-Gaussian Instance-specific Embedding Construction
Soft-position embedding projects each coordinate into a -dimensional vector via super-Gaussian radial basis functions, with a learnable, instance-specific scale parameter . The embedding is defined as: with each component
where is a projection vector, are fixed offsets, is a fixed exponent, and is the coordinate-dependent scale (learnable). This construction enables localized control over the embedding bandwidth, adapting the smoothness and expressivity to each coordinate instance.
2. Joint Objective: Data Fit and Graph-Laplacian Smoothness
The embedding is trained alongside an MLP using a combined surrogate loss composed of a conventional data-fitting term and a graph-Laplacian regularizer. For input-output pairs :
- Define as the Frobenius norm of the input Jacobian of at .
- The loss function: where
is the empirical data loss, and
is the Laplacian regularizer with an “anti-collapse” term. Here, is the graph Laplacian over embedded coordinates and the adjacency matrix. Hyperparameters govern the trade-off and collapse prevention.
This formulation encourages local smoothness of target Jacobians across the learned manifold, effectively matching manifold volume elements to the model’s functional complexity at each location.
3. Graph Laplacian Construction and Smoothness Enforcement
The graph structure is imposed over the set of embedded coordinates by defining an adjacency matrix :
where control smoothing. The (unnormalized) Laplacian is , for the diagonal degree matrix.
Minimization of enforces that embedded points close in the positional manifold exhibit similar local functional complexity, measured by Jacobian norms. The continuous analogue aligns the embedding’s metric determinant with the network’s Jacobian norm, regulating the positional manifold’s distortion.
4. Two-Stage Optimization: Decoupling Embedding and Network Fitting
The optimization proceeds in two decoupled stages:
- Stage I: Embedding hyperparameters () are fit by minimizing using gradient descent, with gradients propagating through into the Laplacian and adjacency matrices. To circumvent scaling, the mapping from input gradient norms to is approximated by a low-degree polynomial:
where coefficients are learned via least-squares on a small calibration set and can be reused for new functions without re-search.
- Stage II: With fixed, becomes a deterministic embedding. The MLP is then trained by standard stochastic optimization (e.g., Adam) to minimize data error, using fixed embedding parameters.
This staged decoupling prevents overfitting observed in joint end-to-end optimization. The procedure yields robust generalization by separating embedding smoothness induction from function approximation.
5. Comparative Evaluation Against Random Fourier Features
Quantitative and qualitative evaluation demonstrates significant improvements of soft-position embedding over RFF:
| Task | Test PSNR (RFF) | Test PSNR (Soft-Embedding) | SSIM (RFF) | SSIM (Soft-Embedding) |
|---|---|---|---|---|
| 1D Signals | ~26 dB | ~31 dB | – | – |
| 2D Images | – | +2–3 dB relative gain | – | – |
| 3D NeRF Scenes | – | +1 dB; .947 → .981 | .947 | .981 |
- RFF requires per-instance frequency tuning; the soft-position approach attains superior results with universal coefficients.
- RFF fails under permutation/undersampling of its frequencies, while adaptive per-instance confers robustness.
- Shallow MLPs exhibit higher signal fidelity with the smooth embeddings, highlighting the ease of function approximation on smooth positional manifolds.
6. Gradient Stability and Intermediate Integration
Empirical results reveal that, when back-propagating through , the super-Gaussian manifold generates stable, low-variance gradients even under coordinate permutations or deep stacking. In contrast, RFF induces noisy, ill-conditioned gradient fields. Consequently, the embedding can be reliably inserted as an intermediate layer (e.g., at a U-Net bottleneck) without risk of gradient explosion or collapse. This stability is critical for scalable integration of positional encodings in deep architectures.
7. High-Level Implementation and Practical Considerations
Implementation follows a two-stage recipe:
- Polynomial Fitting for
- Select a calibration subset and estimate gradient magnitudes .
- Optimize for via gradient descent on the Laplacian loss.
- Perform least-squares fit for polynomial coefficients mapping .
- Training the MLP with Fixed Embedding
- Compute and derive for all training points.
- Generate for each input using the fitted .
- Train to minimize the data loss over the embedded inputs via Adam or SGD.
Optional iterative updates of alongside network training may further refine the embedding, although the two-stage methodology generally suffices for strong performance.
Summary
Soft-position embedding employs a super-Gaussian coordinate-to-feature mapping with per-instance bandwidth, learned under a graph-Laplacian smoothness prior. This architecture optimally balances local detail and global smoothness, exceeds RFF in fidelity and robustness, and is suitable for integration within deep neural networks without laborious hyperparameter search or gradient instability (Ramasinghe et al., 2021).