Papers
Topics
Authors
Recent
2000 character limit reached

Sinusoidal Representation Networks

Updated 25 December 2025
  • Sinusoidal Representation Networks (SIRENs) are neural networks that employ sine activation functions to efficiently model high-frequency structures in various signals.
  • Their specialized architecture and initialization methods reduce spectral bias and ensure robust convergence across applications like image and audio representation.
  • Variants such as SineKAN, SASNet, and STAF extend SIRENs by incorporating adaptive frequency scaling and local capacity adjustments to enhance modeling in complex, high-dimensional tasks.

Sinusoidal Representation Networks (SIRENs) are a class of neural networks that use continuous periodic activation functions—typically the sine function—instead of nonlinearities such as ReLU or tanh. Initially introduced for implicit neural representations (INRs), SIRENs excel at modeling high-frequency structure in signals and yield favorable properties for representing images, videos, audio, geometric signals, and physical fields. Their architecture, initialization, and theoretical underpinnings differ markedly from standard MLPs, giving rise to unique training behaviors, inductive biases, and applications in scientific and engineering domains.

1. Definition, Architecture, and Initialization

The canonical SIREN is a feed-forward multilayer perceptron defined as follows:

  • Each hidden layer computes

y(l)=sin(W(l1)y(l1)+b(l1))y^{(l)} = \sin(W^{(l-1)} y^{(l-1)} + b^{(l-1)} )

with y(0)=xy^{(0)} = x, the input vector.

  • The first hidden layer often incorporates a frequency scaling:

y(1)=sin(ω0(W(0)x+b(0)))y^{(1)} = \sin(\omega_0 (W^{(0)} x + b^{(0)}))

where ω0\omega_0 regulates the base frequency encoded at the input. Subsequent layers typically use ω=1\omega=1.

  • Output is produced via a final linear layer.

Initialization is crucial for stable and expressive training:

  • Input weights: Wjk(0)U(1n,+1n)W^{(0)}_{jk} \sim \mathcal{U}(-\frac{1}{n},+\frac{1}{n}); b(0)U(πω0,+πω0)b^{(0)} \sim \mathcal{U}(-\frac{\pi}{\omega_0},+\frac{\pi}{\omega_0}).
  • Hidden weights: Wjk()U(6n,+6n)W^{(\ell)}_{jk} \sim \mathcal{U}(-\sqrt{\frac{6}{n}},+\sqrt{\frac{6}{n}}) for layer width nn.
  • The base frequency ω0\omega_0 is chosen proportional to the maximum signal frequency (e.g., ω0fNyq/8\omega_0 \approx f_{\mathrm{Nyq}}/8 for Nyquist frequency fNyqf_{\mathrm{Nyq}}) (Belbute-Peres et al., 2022).

Variants exist, including:

  • Fixed or learnable frequency scaling per layer.
  • Multi-term trainable activations (as in STAF) with independent amplitude, frequency, and phase per term (Morsali et al., 2 Feb 2025).
  • Hybrid initializations using frozen frequency dictionaries or learned bases (Novello et al., 30 Jul 2024).

2. Spectral Properties and Neural Tangent Kernel Analysis

A defining feature of SIRENs is their ability to synthesize and fit high-frequency content—circumventing the low-frequency “spectral bias” of ReLU- and tanh-based networks. Theoretical results show:

  • Hidden-layer outputs expand into infinite harmonic sums with frequencies given by all integer linear combinations of the input frequencies (Novello, 2022, Novello et al., 30 Jul 2024).
  • The amplitude of each harmonic decays super-exponentially in its order, with explicit Bessel function–based bounds:

αk(a)<i=1n(ai/2)kiki!|\alpha_k(a)| < \prod_{i=1}^n \frac{(|a_i|/2)^{|k_i|}}{|k_i|!}

controlling high-order contributions (Novello, 2022).

  • In the infinite-width limit, the neural tangent kernel (NTK) of a single-layer SIREN closely approximates a Gaussian (for simple sinusoidal networks, SSN) or sinc (for classic SIREN) low-pass filter, with bandwidth set by ω\omega (Belbute-Peres et al., 2022). In deeper networks, NTKs approach Gaussian behavior, empirically functioning as low-pass filters with bandwidth ω\sim \omega.

This spectral view endows SIRENs with:

  • Rapid convergence for functions with broad spectral support.
  • Diagnosable behavior in terms of frequency response—allowing controlled tuning of expressivity and overfitting by initializing and bounding input frequencies and weight scales (Belbute-Peres et al., 2022, Novello et al., 30 Jul 2024).

3. Model Variants and Theoretical Generalization

Numerous SIREN variants have emerged to further control spectral bias, improve convergence, and adapt expressivity:

  • SineKAN: Embedding SIREN activations within Kolmogorov–Arnold superposition networks, with inner and outer function classes as sums of sinusoids of learnable amplitudes and frequencies. This structure admits a constructive universal approximation theorem for multivariate continuous functions, outperforming fixed-frequency Fourier representations and MLPs with sigmoidal activation in parameter efficiency (Gleyzer et al., 1 Aug 2025).
  • SASNet: Enhances vanilla SIREN by integrating a frozen frequency embedding dictionary and spatially adaptive masks learned through a separate network. This controls frequency leakage and localizes capacity, leading to superior PSNR, SSIM, and convergence stability, especially on signals requiring spatially localized high-frequency fitting (Feng et al., 12 Mar 2025).
  • STAF: Replaces the fixed sine activation in each layer by a sum over trainable sinusoidal basis functions. STAF learns amplitude, frequency, and phase jointly with the network, yielding broader effective bandwidth and greatly improved convergence rates and final accuracy across image, audio, and radiance field tasks (Morsali et al., 2 Feb 2025).
  • TUNER: Addresses the generation and amplification of high-order harmonics by initializing input frequencies on integer grids matched to the desired period and spectral support, then bounding hidden-layer weights (either hard or learnable clamps) to prevent excessive high-frequency growth. This yields robust, stable convergence, particularly in high-dimensional or ill-conditioned settings (Novello et al., 30 Jul 2024).
  • SPDER: Modifies the SIREN activation by multiplying the sine with a sublinear “damping” function (e.g., x\sqrt{|x|}). This preserves the periodic coding of position while guaranteeing that absolute coordinate information is retained layer by layer, further reducing spectral bias and boosting representation fidelity (Shah et al., 2023).

4. Practical Recipe and Hyperparameter Guidelines

Guidelines for constructing effective SIRENs are consistent across architectures:

5. Empirical Benchmarks and Application Domains

SIRENs and their variants have been validated across a wide spectrum of applications, demonstrating superiority over ReLU, tanh, and even advanced positional-encoding approaches.

Application SIREN/Variant Key Results & Metrics
Image Representation SIREN, SASNet, STAF STAF: PSNR 104.6 dB (Celtic image), SASNet PSNR 35.5 dB (DIV2K); SIREN outperforms ReLU, tanh (Morsali et al., 2 Feb 2025, Feng et al., 12 Mar 2025)
Video, Audio, Shape Fitting SIREN, SPDER SPDER achieves MSE 10310^3105×10^5\times lower error and converges 10x faster vs. SIREN (Shah et al., 2023)
Medical Imaging Compression SIREN Achieves PSNR 36.4 dB, SSIM 0.98, compresses 4D dMRI \approx10x vs. DEFLATE, outperforms JPEG2000 (Mancini et al., 2022)
PINN/Scientific PDEs SIREN, SSN Tuning ω\omega reduces parameter-estimation error by up to 2×2\times vs. tanh-NN (Belbute-Peres et al., 2022)
Control Systems SIREN (G&CNET) Achieves lower control error and faster convergence than ReLU/Softplus G&CNETs in drone/spacecraft tasks (Origer et al., 28 May 2024)
Geographic Encoding SIREN + Spherical Harmonics Matches/outperforms double Fourier/SH baselines; robust at poles (Rußwurm et al., 2023)
Time-series Modeling SineKAN Outperforms truncated Fourier and MLPs for rapidly oscillatory/non-smooth 1D and 2D benchmarks (Gleyzer et al., 1 Aug 2025)
Multimodal Neuroscience SIREN Improved prediction of fMRI time series from EEG (r = 0.47 avg.) (Li et al., 2023)

6. Theoretical Insights and Spectral Bias Mitigation

SIRENs natively characterize their expressivity in the Fourier domain. Key theoretical findings include:

7. Limitations, Extensions, and Outlook

Despite their strengths, SIRENs face challenges:

  • Training instability and overfitting when ω0\omega_0 or network depth is too large, mitigated by spectral bounding or adaptive masking (Belbute-Peres et al., 2022, Novello et al., 30 Jul 2024, Feng et al., 12 Mar 2025).
  • Higher per-activation compute cost compared to ReLU/Softplus (due to sine evaluations), but this is offset by superior parameter efficiency and faster convergence for most target functions (Gleyzer et al., 1 Aug 2025).
  • Lack of built-in invariances—coordinate transformations (e.g., translations) must be handled at the data level.
  • Extensions to high-dimensional or non-grid data (e.g., NeRFs, spatiotemporal fields) benefit from hybrid positional encoding, learned frequency bases, or domain-specific architectures (Morsali et al., 2 Feb 2025, Rußwurm et al., 2023).

Future directions include learnable adaptive spectra, hybrid basis models (e.g., spherical harmonics plus SIRENs), dynamic resource allocation (pruning frequency bases), and application to scientific domains where continuous, differentiable approaches to signal, field, or PDE modeling are required.


SIRENs provide a rigorous, expressive, and theoretically well-characterized class of function approximators suitable for high-fidelity, continuous modeling in scientific computing, computer vision, geometric learning, and beyond (Sitzmann et al., 2020, Belbute-Peres et al., 2022, Novello, 2022, Novello et al., 30 Jul 2024, Morsali et al., 2 Feb 2025, Feng et al., 12 Mar 2025, Gleyzer et al., 1 Aug 2025, Mancini et al., 2022, Shah et al., 2023, Paz et al., 3 Feb 2024, Rußwurm et al., 2023, Origer et al., 28 May 2024, Li et al., 2023).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Sinusoidal Representation Networks.