Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 93 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

A Robust Method for Pitch Tracking in the Frequency Following Response using Harmonic Amplitude Summation Filterbank (2506.19253v1)

Published 24 Jun 2025 in cs.SD, eess.AS, and eess.SP

Abstract: The Frequency Following Response (FFR) reflects the brain's neural encoding of auditory stimuli including speech. Because the fundamental frequency (F0), a physical correlate of pitch, is one of the essential features of speech, there has been particular interest in characterizing the FFR at F0, especially when F0 varies over time. The standard method for extracting F0 in FFRs has been the Autocorrelation Function (ACF). This paper investigates harmonic-structure-based F0 estimation algorithms, originally developed for speech and music, and resolves their poor performance when applied to FFRs in two steps. Firstly, given that unlike in speech or music, stimulus F0 of FFRs is already known, we introduce a stimulus-aware filterbank that selectively aggregates amplitudes at F0 and its harmonics while suppressing noise at non-harmonic frequencies. This method, called Harmonic Amplitude Summation (HAS), evaluates F0 candidates only within a range centered around the stimulus F0. Secondly, unlike other pitch tracking methods that select the highest peak, our method chooses the most prominent one, as it better reflects the underlying periodicity of FFRs. To the best of our knowledge, this is the first study to propose an F0 estimation algorithm for FFRs that relies on harmonic structure. Analyzing recorded FFRs from 16 normal hearing subjects to 4 natural speech stimuli with a wide F0 variation from 89 Hz to 452 Hz showed that this method outperformed ACF by reducing the average Root-Mean-Square-Error (RMSE) within each response and stimulus F0 contour pair by 8.8% to 47.4%, depending on the stimulus.