Identify features driving increases or decreases in perceived kawaii after voice frequency manipulation

Identify the specific acoustic or perceptual features of digital voices—across both text-to-speech systems and prerecorded game character voices—that are responsible for increases (kawaii++) or decreases (kawaii--) in perceived kawaiiness when fundamental (F0) and first-to-third formant (F1–F3) frequencies are manipulated. Determine which voice attributes, beyond F0 and F1–F3, explain the divergent outcomes observed and resolve potential confounds introduced by other voice or sound features.

Background

The paper investigates whether and how perceived kawaiiness in computer voices can be manipulated by adjusting fundamental (F0) and formant (F1–F3) frequencies in both text-to-speech (TTS) and game character voices. Results show that while such manipulations can amplify kawaii perceptions for certain TTS voices, they can also reduce kawaii or yield mixed effects for professionally recorded game character voices.

Despite finding evidence of voice-specific "sweet spots" and potential ceiling effects, the authors could not determine which specific voice characteristics account for why the same manipulation sometimes increases (kawaii++) and other times decreases (kawaii--) perceived kawaiiness. They call for further analysis of additional acoustic or perceptual attributes that may have influenced outcomes, beyond the frequencies directly manipulated.

References

We were not able to identify what specific features of the voices contributed to kawaii++ or kawaii--. Future work will need to explore these voices in more detail or consider other voice or sound features that may have confounded the results.

Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice  (2507.06235 - Mandai et al., 20 May 2025) in Section Overall Discussion