Sinusoidal Representation Networks

Updated 25 December 2025

Sinusoidal Representation Networks (SIRENs) are neural networks that employ sine activation functions to efficiently model high-frequency structures in various signals.
Their specialized architecture and initialization methods reduce spectral bias and ensure robust convergence across applications like image and audio representation.
Variants such as SineKAN, SASNet, and STAF extend SIRENs by incorporating adaptive frequency scaling and local capacity adjustments to enhance modeling in complex, high-dimensional tasks.

Sinusoidal Representation Networks (SIRENs) are a class of neural networks that use continuous periodic activation functions—typically the sine function—instead of nonlinearities such as ReLU or tanh. Initially introduced for implicit neural representations (INRs), SIRENs excel at modeling high-frequency structure in signals and yield favorable properties for representing images, videos, audio, geometric signals, and physical fields. Their architecture, initialization, and theoretical underpinnings differ markedly from standard MLPs, giving rise to unique training behaviors, inductive biases, and applications in scientific and engineering domains.

1. Definition, Architecture, and Initialization

The canonical SIREN is a feed-forward multilayer perceptron defined as follows:

Each hidden layer computes

$y^{(l)} = \sin(W^{(l-1)} y^{(l-1)} + b^{(l-1)} )$

with $y^{(0)} = x$ , the input vector.

The first hidden layer often incorporates a frequency scaling:

$y^{(1)} = \sin(\omega_0 (W^{(0)} x + b^{(0)}))$

where $\omega_0$ regulates the base frequency encoded at the input. Subsequent layers typically use $\omega=1$ .

Output is produced via a final linear layer.

Initialization is crucial for stable and expressive training:

Input weights: $W^{(0)}_{jk} \sim \mathcal{U}(-\frac{1}{n},+\frac{1}{n})$ ; $b^{(0)} \sim \mathcal{U}(-\frac{\pi}{\omega_0},+\frac{\pi}{\omega_0})$ .
Hidden weights: $W^{(\ell)}_{jk} \sim \mathcal{U}(-\sqrt{\frac{6}{n}},+\sqrt{\frac{6}{n}})$ for layer width $n$ .
The base frequency $\omega_0$ is chosen proportional to the maximum signal frequency (e.g., $\omega_0 \approx f_{\mathrm{Nyq}}/8$ for Nyquist frequency $f_{\mathrm{Nyq}}$ ) (Belbute-Peres et al., 2022).

Variants exist, including:

Fixed or learnable frequency scaling per layer.
Multi-term trainable activations (as in STAF) with independent amplitude, frequency, and phase per term (Morsali et al., 2 Feb 2025).
Hybrid initializations using frozen frequency dictionaries or learned bases (Novello et al., 2024).

2. Spectral Properties and Neural Tangent Kernel Analysis

A defining feature of SIRENs is their ability to synthesize and fit high-frequency content—circumventing the low-frequency “spectral bias” of ReLU- and tanh-based networks. Theoretical results show:

Hidden-layer outputs expand into infinite harmonic sums with frequencies given by all integer linear combinations of the input frequencies (Novello, 2022, Novello et al., 2024).
The amplitude of each harmonic decays super-exponentially in its order, with explicit Bessel function–based bounds:

$|\alpha_k(a)| < \prod_{i=1}^n \frac{(|a_i|/2)^{|k_i|}}{|k_i|!}$

controlling high-order contributions (Novello, 2022).

In the infinite-width limit, the neural tangent kernel (NTK) of a single-layer SIREN closely approximates a Gaussian (for simple sinusoidal networks, SSN) or sinc (for classic SIREN) low-pass filter, with bandwidth set by $\omega$ (Belbute-Peres et al., 2022). In deeper networks, NTKs approach Gaussian behavior, empirically functioning as low-pass filters with bandwidth $\sim \omega$ .

This spectral view endows SIRENs with:

Rapid convergence for functions with broad spectral support.
Diagnosable behavior in terms of frequency response—allowing controlled tuning of expressivity and overfitting by initializing and bounding input frequencies and weight scales (Belbute-Peres et al., 2022, Novello et al., 2024).

3. Model Variants and Theoretical Generalization

Numerous SIREN variants have emerged to further control spectral bias, improve convergence, and adapt expressivity:

SineKAN: Embedding SIREN activations within Kolmogorov–Arnold superposition networks, with inner and outer function classes as sums of sinusoids of learnable amplitudes and frequencies. This structure admits a constructive universal approximation theorem for multivariate continuous functions, outperforming fixed-frequency Fourier representations and MLPs with sigmoidal activation in parameter efficiency (Gleyzer et al., 1 Aug 2025).
SASNet: Enhances vanilla SIREN by integrating a frozen frequency embedding dictionary and spatially adaptive masks learned through a separate network. This controls frequency leakage and localizes capacity, leading to superior PSNR, SSIM, and convergence stability, especially on signals requiring spatially localized high-frequency fitting (Feng et al., 12 Mar 2025).
STAF: Replaces the fixed sine activation in each layer by a sum over trainable sinusoidal basis functions. STAF learns amplitude, frequency, and phase jointly with the network, yielding broader effective bandwidth and greatly improved convergence rates and final accuracy across image, audio, and radiance field tasks (Morsali et al., 2 Feb 2025).
TUNER: Addresses the generation and amplification of high-order harmonics by initializing input frequencies on integer grids matched to the desired period and spectral support, then bounding hidden-layer weights (either hard or learnable clamps) to prevent excessive high-frequency growth. This yields robust, stable convergence, particularly in high-dimensional or ill-conditioned settings (Novello et al., 2024).
SPDER: Modifies the SIREN activation by multiplying the sine with a sublinear “damping” function (e.g., $\sqrt{|x|}$ ). This preserves the periodic coding of position while guaranteeing that absolute coordinate information is retained layer by layer, further reducing spectral bias and boosting representation fidelity (Shah et al., 2023).

4. Practical Recipe and Hyperparameter Guidelines

Guidelines for constructing effective SIRENs are consistent across architectures:

Hidden layers: 3–8 layers, 64–256 hidden units per layer (Sitzmann et al., 2020, Belbute-Peres et al., 2022, Rußwurm et al., 2023).
Input scaling: Normalize input domains to $[-1,1]$ per axis.
Frequency hyperparameter $\omega_0$ : Set $\omega_0 \approx$ (max frequency of dataset)/8 or $f_{\mathrm{Nyq}}/8$ for images/audio; use separate $\omega$ per axis for anisotropic data (Belbute-Peres et al., 2022).
Weight initialization: Employ uniform scaling for the first layer, Kaiming or He for subsequent layers (possibly modulated by $1/\omega_0$ for stability) (Belbute-Peres et al., 2022, Sitzmann et al., 2020).
Optimizer: Adam, $\mathrm{lr}=10^{-3}$ – $10^{-4}$ (lower for PDE tasks) (Belbute-Peres et al., 2022, Sitzmann et al., 2020).
For tileable or periodic outputs, fix the input layer’s frequencies to integer multiples of $2\pi/P$ for desired period $P$ , ensuring global periodicity by construction (Paz et al., 2024, Novello et al., 2024).
For controlling spectral leakage and noise, apply explicit spectral bounds on hidden-layer weights or incorporate sparse/grouped adaptive masking (Novello et al., 2024, Feng et al., 12 Mar 2025).

5. Empirical Benchmarks and Application Domains

SIRENs and their variants have been validated across a wide spectrum of applications, demonstrating superiority over ReLU, tanh, and even advanced positional-encoding approaches.

Application	SIREN/Variant	Key Results & Metrics
Image Representation	SIREN, SASNet, STAF	STAF: PSNR 104.6 dB (Celtic image), SASNet PSNR 35.5 dB (DIV2K); SIREN outperforms ReLU, tanh (Morsali et al., 2 Feb 2025, Feng et al., 12 Mar 2025)
Video, Audio, Shape Fitting	SIREN, SPDER	SPDER achieves MSE $10^3$ – $10^5\times$ lower error and converges 10x faster vs. SIREN (Shah et al., 2023)
Medical Imaging Compression	SIREN	Achieves PSNR 36.4 dB, SSIM 0.98, compresses 4D dMRI $\approx$ 10x vs. DEFLATE, outperforms JPEG2000 (Mancini et al., 2022)
PINN/Scientific PDEs	SIREN, SSN	Tuning $\omega$ reduces parameter-estimation error by up to $2\times$ vs. tanh-NN (Belbute-Peres et al., 2022)
Control Systems	SIREN (G&CNET)	Achieves lower control error and faster convergence than ReLU/Softplus G&CNETs in drone/spacecraft tasks (Origer et al., 2024)
Geographic Encoding	SIREN + Spherical Harmonics	Matches/outperforms double Fourier/SH baselines; robust at poles (Rußwurm et al., 2023)
Time-series Modeling	SineKAN	Outperforms truncated Fourier and MLPs for rapidly oscillatory/non-smooth 1D and 2D benchmarks (Gleyzer et al., 1 Aug 2025)
Multimodal Neuroscience	SIREN	Improved prediction of fMRI time series from EEG (r = 0.47 avg.) (Li et al., 2023)

6. Theoretical Insights and Spectral Bias Mitigation

SIRENs natively characterize their expressivity in the Fourier domain. Key theoretical findings include:

Every hidden neuron in a SIREN expands into a (truncated) harmonic dictionary whose amplitudes can be estimated and pruned using Bessel-derived upper bounds (Novello, 2022, Novello et al., 2024).
Periodicity is inherited layer-wise if input frequencies are integer multiples of fundamental periods, ensuring perfect periodicity in outputs—crucial for texture synthesis, tileable materials, and audio (Paz et al., 2024, Novello et al., 2024).
Hybrid or trainable activations such as STAF or SPDER further expand the effective frequency basis available, boosting approximation capacity polynomially with the number of activation terms and network depth (Morsali et al., 2 Feb 2025, Gleyzer et al., 1 Aug 2025, Shah et al., 2023).
Control over spectral leakage via frequency/group masking and explicit bounding mechanisms suppresses overfitting and unstable gradient flows without sacrificing representation power (Feng et al., 12 Mar 2025, Novello et al., 2024).

7. Limitations, Extensions, and Outlook

Despite their strengths, SIRENs face challenges:

Training instability and overfitting when $\omega_0$ or network depth is too large, mitigated by spectral bounding or adaptive masking (Belbute-Peres et al., 2022, Novello et al., 2024, Feng et al., 12 Mar 2025).
Higher per-activation compute cost compared to ReLU/Softplus (due to sine evaluations), but this is offset by superior parameter efficiency and faster convergence for most target functions (Gleyzer et al., 1 Aug 2025).
Lack of built-in invariances—coordinate transformations (e.g., translations) must be handled at the data level.
Extensions to high-dimensional or non-grid data (e.g., NeRFs, spatiotemporal fields) benefit from hybrid positional encoding, learned frequency bases, or domain-specific architectures (Morsali et al., 2 Feb 2025, Rußwurm et al., 2023).

Future directions include learnable adaptive spectra, hybrid basis models (e.g., spherical harmonics plus SIRENs), dynamic resource allocation (pruning frequency bases), and application to scientific domains where continuous, differentiable approaches to signal, field, or PDE modeling are required.

SIRENs provide a rigorous, expressive, and theoretically well-characterized class of function approximators suitable for high-fidelity, continuous modeling in scientific computing, computer vision, geometric learning, and beyond (Sitzmann et al., 2020, Belbute-Peres et al., 2022, Novello, 2022, Novello et al., 2024, Morsali et al., 2 Feb 2025, Feng et al., 12 Mar 2025, Gleyzer et al., 1 Aug 2025, Mancini et al., 2022, Shah et al., 2023, Paz et al., 2024, Rußwurm et al., 2023, Origer et al., 2024, Li et al., 2023).