- The paper introduces Fourier feature mappings to overcome spectral bias in coordinate-based MLPs, enabling them to capture high-frequency details.
- It employs a sinusoidal transformation based on NTK theory that converts input coordinates into a shift-invariant kernel for improved convergence.
- Empirical results demonstrate significant performance gains in applications like image regression, 3D reconstruction, and neural rendering.
Fourier Features Enable Networks to Learn High Frequency Functions in Low Dimensional Domains
The paper "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains" by Matthew Tancik et al. presents a novel method to improve the capabilities of coordinate-based multilayer perceptrons (MLPs) in capturing high-frequency components in low-dimensional regression tasks. The authors address a critical limitation of MLPs, known as spectral bias, which restricts their ability to learn high-frequency information.
Introduction and Background
Coordinate-based MLPs have recently shown great promise in representing continuous functions for computer vision and graphics tasks. These models parameterize scenes or objects using low-dimensional input coordinates (e.g., points in R3) and yield outputs such as density, color, or occupancy. However, standard MLPs are inherently biased towards learning low-frequency functions, a problem substantiated both theoretically and experimentally by the authors.
Utilizing insights from the neural tangent kernel (NTK) theory, this paper elucidates why standard MLPs struggle with high-frequency learning. According to NTK literature, the eigenvalue spectrum of the NTK decays rapidly, leading to an exponentially slower convergence rate for higher-frequency components during gradient descent.
Fourier Feature Mapping
To overcome this spectral bias, the authors propose transforming the input coordinates with a Fourier feature mapping before feeding them into the MLP. Specifically, input coordinates v are mapped to a higher-dimensional space using a series of sinusoidal functions:
γ(v)=[cos(2πb1⊤v),sin(2πb1⊤v),…,cos(2πbm⊤v),sin(2πbm⊤v)]
Here, bj are frequency vectors that can be randomly sampled from an isotropic distribution.
This transformation alters the NTK into a stationary kernel, which importantly makes the NTK shift-invariant over the input domain and hence more suitable for the uniform distribution of sampled training points in low-dimensional regression problems.
Experimental Validation and Theoretical Insights
The authors validate the Fourier feature approach through a series of experiments spanning 1D and 2D regression tasks and demonstrate significant improvements in tasks such as image regression, 3D shape representation, CT and MRI reconstruction, and inverse rendering for novel view synthesis.
Specifically, experiments show that Fourier feature mappings notably outperform positional encodings and standard input mappings, especially when the appropriate frequency scale is selected. Theoretical analysis further indicates that modifying the frequency distribution bj and controlling the bandwidth is crucial for managing the balance between underfitting and overfitting.
In addition to analytical evaluations, empirical results demonstrate that this method can create models that efficiently converge to high-frequency components of the target function, which standard MLPs fail to capture.
Implications and Future Work
The implications of these findings are substantial for the fields of computer vision and graphics. Fourier features allow for compact and computationally efficient representations of complex high-frequency details in continuous domains, thereby paving the way for more accurate and expressive models in visual and geometric tasks.
Future research avenues may include investigating the impact of different sampling distributions for the frequency vectors bj, optimizing the Fourier feature scales dynamically during training, and extending the approach to other neural network architectures and higher-dimensional domains.
In practical scenarios, incorporating Fourier feature mappings can dramatically improve the performance of applications requiring precise high-frequency detail management, such as neural rendering, texture synthesis, and volumetric data reconstruction. The method is simple to implement and integrates well with existing frameworks, making it a pragmatic enhancement for a variety of downstream tasks.
Conclusion
The paper offers a robust solution to a fundamental limitation of coordinate-based MLPs, demonstrating that Fourier feature mappings effectively allow these networks to learn high-frequency functions in low-dimensional domains. This approach leverages NTK theory to provide a combinatorial benefit of both spectral tuning and shift invariance, thereby making it a highly valuable tool in the arsenal of modern computer vision and graphics research.