Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Child Speech Recognition in Human-Robot Interaction: Problem Solved? (2404.17394v2)

Published 26 Apr 2024 in cs.CL, cs.HC, and cs.RO

Abstract: Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. Performance improves even more in highly structured interactions when priming models with specific phrases. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Social robots for education: A review. Science robotics 3, 21 (2018), eaat5954.
  2. Children speech recording (English, spontaneous speech + pre-defined sentences). https://doi.org/10.5281/zenodo.200495
  3. Child speech recognition in human-robot interaction: evaluations and recommendations. In Proceedings of the 2017 ACM/IEEE international conference on human-robot interaction. 82–90.
  4. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700 (2019).
  5. Transformers in speech processing: A survey. arXiv preprint arXiv:2303.11607 (2023).
  6. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492–28518.
  7. Gabriel Skantze. 2021. Turn-taking in conversational systems and human-robot interaction: a review. Computer Speech & Language 67 (2021), 101178.
  8. The Microsoft 2017 conversational speech recognition system. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5934–5938.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets