Perceptual scaling of spectral flatness

Determine an empirically validated psychophysical scaling function for the spectral flatness audio descriptor, specifying how raw spectral flatness values should be mapped to a perceptual scale for timbre-related applications.

Background

The paper defines an audio feature extractor for timbre remapping that includes loudness (LKFS), spectral centroid, temporal centroid, and spectral flatness. For loudness, spectral centroid, and temporal centroid, the authors adopt or derive psychophysical scaling functions from existing literature to enable meaningful translations in timbre space.

For spectral flatness, the authors note the absence of established perceptual scalings in the literature. As a pragmatic choice, they convert spectral flatness to a decibel scale following guidance from Librosa documentation, highlighting a gap that a validated perceptual scaling would fill to improve timbre remapping by aligning changes in this descriptor with perceived timbre differences.

References

To our knowledge, there is no literature investigating the perceptual scaling of spectral flatness; however, taking guidance from the Librosa documentation , spectral flatness is converted to a decibel scale: $s_{\text{SF}(x) = 20\log_{10}(x_{\text{SF})$.

— Real-time Timbre Remapping with Differentiable DSP (2407.04547 - Shier et al., 2024) in Subsubsection 'Psychophysical Scaling' within 'Audio Features'

Perceptual scaling of spectral flatness

Background

References

Related Problems