Papers
Topics
Authors
Recent
Search
2000 character limit reached

Phase-Aware Loss Functions

Updated 1 February 2026
  • Phase-Aware Loss Functions are loss objectives that penalize both magnitude and phase differences in complex-valued representations, such as the STFT, to produce perceptually accurate audio outputs.
  • They employ methods like direct complex differences, phase-weighted log spectral errors, and consistency-preserving strategies to ensure valid spectrogram reconstructions.
  • Empirical studies show that incorporating phase-aware losses improves perceptual metrics like PESQ and SI-SDR, benefiting tasks such as speech enhancement and phase reconstruction.

A phase-aware loss function is any objective that penalizes discrepancies in both magnitude and phase when comparing complex-valued representations, such as the Short-Time Fourier Transform (STFT) of signals, during training of neural models. These losses are crucial in machine hearing, especially speech enhancement and phase reconstruction tasks, where magnitude-only losses are insufficient for producing perceptually plausible outputs. Phase-aware losses explicitly account for phase information, thereby improving perceptual quality and reducing artifacts such as musical noise. They include direct complex-domain distances and newer STFT-consistency criteria, and have also inspired broader meta-learning developments in dynamically adaptive objective functions.

1. Mathematical Formulation of Phase-Aware Losses

Classical phase-aware losses penalize deviation in the complex STFT, rather than only amplitude. The following table catalogs key phase-sensitive losses, with their mathematical forms and a brief description:

Loss Function LaTeX Formula Phase Sensitivity Mechanism
Complex MSE (cMSE) LcMSE=S^(k,n)S(k,n)2k,n\mathcal{L}_\text{cMSE} = \langle\,|\widehat S(k,n)-S(k,n)|^2\rangle_{k,n} S^\widehat S and SS compared in complex domain; both mag/phase errors
Complex MAE (cMAE) LcMAE=S^(k,n)S(k,n)k,n\mathcal{L}_\text{cMAE} = \langle\,|\widehat S(k,n)-S(k,n)|\rangle_{k,n} 1\ell_1 norm in complex domain, directly incorporates phase diff
Complex Compressed MSE LcComp=A^cejφS^AcejφS2\mathcal{L}_\text{cComp} = \langle\,|\widehat A^c e^{j\varphi_{\widehat S}} - A^c e^{j\varphi_S}|^2\rangle Uses compressed magnitudes, but phase penalty retained
Phase-aware Log Spectral D. see Eq.(8), e.g., includes (2cosΔφ)(2-\cos\Delta\varphi) scaling of log-mag error Phase misalignment amplifies log-magnitude penalty
SDR (complex) LSDR=log10S2S^S2\mathcal{L}_\text{SDR} = -\log_{10} \frac{\langle|S|^2\rangle}{\langle|\widehat S-S|^2\rangle} Both magnitude/phase errors reduce numerator, increase denominator
Complex Coherence (cCorr) LcCorr=S^SS^2S2\mathcal{L}_\text{cCorr} = -\frac{\Re\,\langle\widehat S S^*\rangle}{\sqrt{\langle|\widehat S|^2\rangle\langle|S|^2\rangle}} Measures phase alignment via real part of normalized inner product
Consistency-Preserving LEC(H)=m,nqej2πqRNn(αq(R)H)mq,n2\mathcal{L}_{EC}(\mathbf{H})= {\textstyle \sum_{m,n}} \left| \sum_q e^{j2\pi \frac{qR}{N}n} (\alpha_q^{(R)} * \mathbf{H})_{m-q, n} \right|^2 Ensures H\mathbf{H} is a consistent STFT, indirectly constraining phase

Magnitude-only counterparts operate solely on S^|\widehat S| and S|S|, hence ignore phase. Mixtures are often formed as Lmix=(1β)Lmag+βLcomplex\mathcal{L}_\text{mix} = (1-\beta)\mathcal{L}_\text{mag} + \beta \mathcal{L}_\text{complex}, with β\beta tuned on validation data.

2. Mechanisms of Phase Incorporation

Each loss incorporates phase through distinct mechanisms:

  • Direct Complex Differences (cMSE, cMAE, cComp): These directly penalize the Δφ\Delta\varphi phase shift between estimated and target STFT bins. For cComp, compressed magnitude reduces impact of large amplitude bins while still enforcing phase alignment.
  • Phase-Weighted Log Spectral (PLSD, wPLSD): These scale the log-magnitude error by a phase dissimilarity factor, typically 2cosΔφ2-\cos\Delta\varphi, highlighting bins with high phase error.
  • Correlation/Coherence (cCorr): Maximizes real-part inner product, inherently driving phase coherence.
  • Consistency-based Losses: Rather than matching a specific target phase, these ensure that the network’s output is a physically realizable STFT—any global or local phase solution is admissible if the output is STFT-consistent.
  • Linear Mixtures: Mixtures combine a magnitude-only and phase-sensitive term, allowing trade-off via the parameter β\beta.

A notable property of the consistency loss is invariance to global phase shifts; phase ambiguity (e.g., ±x[n]\pm x[n]) does not affect the objective, circumventing problems with phase wrapping and time-shift sensitivity (Ku et al., 2024).

3. Experimental Setup and Architectures

Studies of phase-aware objectives typically adopt the following structure for empirical evaluation (Braun et al., 2020):

  • Input Features: STFT, usually 512-point, 32 ms window, 16 ms hop; 255 frequency bins (half-sided).
  • Neural Network (e.g., NSNet2):
    • Feed-forward embedding + 2 causal GRU layers + feed-forward output layers.
    • Real-valued sigmoid output for gain G(k,n)G(k,n) per time-frequency bin, multiplied with the input spectrogram.
    • ≈2.8M parameters, real-time operation (no look-ahead).
  • Loss Optimization: AdamW optimizer, learning rate 1×1041\times10^{-4}. Phase-mixing weights and compression exponents tuned by grid search on validation PESQ.
  • Output Application: Enhanced spectrogram built as S^(k,n)=G(k,n)X(k,n)\widehat S(k,n) = G(k,n) X(k,n), optionally with estimated or unchanged phase.
  • Consistency Loss Integration: For consistency-preserving approaches (Ku et al., 2024), the loss is added as a differentiable term to the overall training objective, with STFT and its inverse handled via FFT libraries.

Phase-aware loss parameters such as mixture weight (β\beta), log-weight exponent, and compression exponent are selected for perceptual metrics on a held-out set.

4. Quantitative Assessment and Empirical Comparison

The impact of phase-aware losses is evaluated primarily with perceptual metrics:

  • PESQ (Perceptual Evaluation of Speech Quality)
  • SI-SDR (Scale-Invariant Signal-to-Distortion Ratio)

A representative summary from (Braun et al., 2020), evaluated on CHiME-2, is provided below:

Loss mag.PESQ(SI-SDR) complex.PESQ(SI-SDR) mixed.PESQ(SI-SDR)
noisy — (—) 2.29 (1.92)
magMSE / cMSE 3.16 (9.57) 3.10 (9.58) 3.17 (9.58)
magMAE / cMAE 3.25 (9.73) 3.08 (9.68) 3.25 (9.75)
LSD / PLSD 3.04 (8.59) 3.03 (8.31)
wLSD / wPLSD 3.19 (9.12) 3.21 (8.88)
Comp / cComp 3.25 (9.45) 2.88 (9.21) 3.31 (9.42)
SNR / SDR 3.15 (9.54) 3.11 (9.62) 3.19 (9.66)
Corr / cCorr 3.16 (9.56) 3.11 (9.60) 3.16 (9.58)

Key findings:

  • Adding any phase-aware term (β>0\beta>0) consistently yields PESQ gains, even when the network is not explicitly enhancing phase.
  • The highest SI-SDR improvements (i.e., phase-sensitive enhancement) are achieved with linear-domain complex MAE and SDR losses.
  • Mixing compressed-magnitude with compressed-complex objectives (β0.3\beta\sim0.3) attains the highest PESQ (3.31).
  • Phase-weighted log-spectral distances (wPLSD) are marginally effective, but pure log domain metrics (PLSD) offer no strong advantage.
  • Heuristic perceptual weightings (SDW, AMR) can underperform due to poor generalization across SNR/reverberation conditions.

Consistency loss models, when compared to direct phase-regression alternatives (e.g., cosine L2, anti-wrapping losses), yield superior or equivalent perceptual scores, and more robust outputs in both “cheating” phase-reconstruction and realistic enhancement tasks (Ku et al., 2024).

5. Consistency-Preserving Losses: Theory, Implementation, and Impact

The consistency-preserving loss [Editor’s term] enforces that the network's STFT output H\mathbf{H}' be a valid spectrogram of some real waveform—i.e., that H=S{S1(H)}\mathbf{H}' = \mathcal{S} \{ \mathcal{S}^{-1}(\mathbf{H}') \}, as formalized by linear constraints in the frequency domain. Its key properties are:

  • It does not require matching the exact ground-truth phase, but only requires any solution yielding a valid (i.e., physically realizable) STFT.
  • It naturally handles global phase-shift indeterminacy: if one solution is feasible, so are its ejθe^{j\theta}-rotated versions.
  • Unlike direct phase-matching losses, it is insensitive to phase wrapping and time shifts.
  • Implementation involves fully differentiable operations: fixed coefficient convolutions, magnitude-squared operations, and FFT-based STFT/inverse transforms.
  • When deployed in separation or enhancement systems, it acts as an effective, architecture-agnostic add-on.

Empirical results show:

  • In phase reconstruction, the consistency loss attains or surpasses state-of-the-art PESQ, ESTOI, and composite scores.
  • In enhancement, adding the consistency loss provides measurable improvements over direct phase losses and noisy-phase baselines, particularly on challenging corpora (e.g., WSJ0-CHiME3, PESQ improved by ≈+0.7 over noisy input) (Ku et al., 2024).

6. Adaptive and Phase-Aware Loss Function Learning

Online loss-function learning (e.g., AdaLFL) introduces phase-awareness in a meta-learning sense, where the “phase” refers to stages of model training, not signal phase (Raymond et al., 2023). In this paradigm:

  • The loss function itself, parameterized as a neural network ϕ\ell_\phi, is updated online after each base-model step, rather than in an offline meta-phase.
  • As the base model transitions from initial to terminal training segments, ϕ\phi adapts in tandem, shaping error gradients to accelerate convergence early, stabilize mid-training, and regularize late.
  • The online protocol mitigates the “short-horizon bias” of two-phase meta-learning, yielding loss shapes that are locally optimal for every training epoch.
  • Experimentally, such adaptivity delivers lower error rates and test loss than both fixed canonical losses (cross-entropy) and offline meta-learned loss functions.

Pseudocode for the AdaLFL adaptation:

1
2
3
4
5
for t in range(total_steps):
    # Base update (inner)
    theta = theta - alpha * grad_theta(M_phi(y_train, f_theta(X_train)))
    # Meta update (outer)
    phi = phi - eta * grad_phi(L_T(y_val, f_theta(X_val)))

A plausible implication is that further integration with phase-sensitive objectives (in the frequency domain) could enable both phase- and training-phase-adaptive loss function learning.

7. Practical Recommendations and Limitations

Best practices in phase-aware loss design for speech enhancement and similar domains are summarized as follows (Braun et al., 2020):

  • Always include a nonzero phase-aware term (β0.2\beta \approx 0.2–$0.4$) in the objective for improved perceptual quality, regardless of whether the network outputs explicit phase estimates.
  • For maximal phase-sensitive distortion reduction, employ linear-domain losses (complex MAE, SDR), as these align with STFT statistics and penalize phase deviations proportionally.
  • To maximize overall speech quality (PESQ), use a mixture of compressed-magnitude and compressed-complex losses, adjusting weights on a validation set.
  • Weighting schemes based on perceptual heuristics (e.g., AMR, SDW) are dataset and task dependent, and may not generalize well—validation across noise/reverberation conditions is essential.
  • Consistency-preserving losses represent a robust recent innovation, offering simple implementation and improved generalization by relaxing the need for a single target phase configuration.
  • In online loss function learning, tuning meta-optimizer rates, using validation-split feedback, and employing smooth activation functions prevent overfitting and yield phase-adaptive objectives.

These considerations enable robust, generalizable deployment of phase-aware objectives in real-time and offline deep learning pipelines for speech and broader audio signal processing.

References:

(Braun et al., 2020, Ku et al., 2024, Raymond et al., 2023)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Phase-Aware Loss Functions.