Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments (2401.03581v1)

Published 7 Jan 2024 in cs.HC, cs.AI, and cs.RO

Abstract: Mindfulness-based therapies have been shown to be effective in improving mental health, and technology-based methods have the potential to expand the accessibility of these therapies. To enable real-time personalized content generation for mindfulness practice in these methods, high-quality computer-synthesized text-to-speech (TTS) voices are needed to provide verbal guidance and respond to user performance and preferences. However, the user-perceived quality of state-of-the-art TTS voices has not yet been evaluated for administering mindfulness meditation, which requires emotional expressiveness. In addition, work has not yet been done to study the effect of physical embodiment and personalization on the user-perceived quality of TTS voices for mindfulness. To that end, we designed a two-phase human subject study. In Phase 1, an online Mechanical Turk between-subject study (N=471) evaluated 3 (feminine, masculine, child-like) state-of-the-art TTS voices with 2 (feminine, masculine) human therapists' voices in 3 different physical embodiment settings (no agent, conversational agent, socially assistive robot) with remote participants. Building on findings from Phase 1, in Phase 2, an in-person within-subject study (N=94), we used a novel framework we developed for personalizing TTS voices based on user preferences, and evaluated user-perceived quality compared to best-rated non-personalized voices from Phase 1. We found that the best-rated human voice was perceived better than all TTS voices; the emotional expressiveness and naturalness of TTS voices were poorly rated, while users were satisfied with the clarity of TTS voices. Surprisingly, by allowing users to fine-tune TTS voice features, the user-personalized TTS voices could perform almost as well as human voices, suggesting user personalization could be a simple and very effective tool to improve user-perceived quality of TTS voice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Understanding barriers, enablers, and long-term adherence to a health behavior intervention in people with multiple sclerosis. Disability and rehabilitation 42, 6 (2020), 822–832.
  2. Christina L Bennett. 2005. Large scale evaluation of corpus-based synthesizers: Results and lessons from the Blizzard Challenge 2005. In Ninth European Conference on Speech Communication and Technology.
  3. Choice of voices: A large-scale evaluation of text-to-speech voice quality for long-form content. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
  4. Julia Cambre and Chinmay Kulkarni. 2019. One voice fits all? Social implications and research challenges of designing voices for smart devices. Proceedings of the ACM on human-computer interaction 3, CSCW (2019), 1–19.
  5. Long-term personalization of an in-home socially assistive robot for children with autism spectrum disorders. Frontiers in Robotics and AI (2019), 110.
  6. Differences in gradient emotion perception: Human vs. Alexa voices. In Proceedings of Interspeech.
  7. What and How Are We Reporting in HRI? A Review and Recommendations for Reporting Recruitment, Compensation, and Gender. arXiv preprint arXiv:2201.09114 (2022).
  8. Gendered voice and robot entities: perceptions and reactions of male and female subjects. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 3735–3741.
  9. Claudia Daudén Roquet and Corina Sas. 2018. Evaluating mindfulness meditation apps. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 1–6.
  10. Mobile mindfulness meditation: a randomised controlled trial of the effect of two popular apps on mental health. Mindfulness 10, 5 (2019), 863–876.
  11. Engaging women with an embodied conversational agent to deliver mindfulness and lifestyle recommendations: A feasibility randomized control trial. Patient education and counseling 100, 9 (2017), 1720–1729.
  12. Avashna Govender and Simon King. 2018. Using pupillometry to measure the cognitive load of synthetic speech. System 50, 100 (2018), 2018–1174.
  13. How do mindfulness-based cognitive therapy and mindfulness-based stress reduction improve mental health and wellbeing? A systematic review and meta-analysis of mediation studies. Clinical psychology review 37 (2015), 1–12.
  14. The provider’s voice: patient satisfaction and the content-filtered speech of nurses and physicians in primary medical care. Journal of Nonverbal Behavior 32, 1 (2008), 1–20.
  15. Simon King. 2014. Measuring a decade of progress in text-to-speech. Loquens 1, 1 (2014), e006–e006.
  16. Lauren A Leotti and Mauricio R Delgado. 2011. The inherent reward of choice. Psychological science 22, 10 (2011), 1310–1318.
  17. Born to choose: The origins and value of the need for control. Trends in cognitive sciences 14, 10 (2010), 457–463.
  18. Physician voice characteristics and patient satisfaction in online health consultation. Information & Management 57, 5 (2020), 103233.
  19. Effects of voice-adaptation and social dialogue on perceptions of a robotic learning companion. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 255–262.
  20. People respond better to robots than computer tablets delivering healthcare instructions. Computers in Human Behavior 43 (2015), 112–117.
  21. Conor McGinn and Ilaria Torre. 2019. Can you tell the robot by the voice? An exploratory study on the role of voice in the perception of robots. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 211–221.
  22. A mismatch in the human realism of face and voice produces an uncanny valley. i-Perception 2, 1 (2011), 10–12.
  23. Small group interactions with voice-user interfaces: Exploring social embodiment, rapport, and engagement. (2021).
  24. The Shape of Our Bias: Perceived Age and Gender in the Humanoid Robots of the ABOT Database. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. 110–119.
  25. HCI guidelines for gender equity and inclusivity. UMBC Faculty Collection (2020).
  26. A review of interactive technologies as support tools for the cultivation of mindfulness. Mindfulness 8, 5 (2017), 1150–1159.
  27. Clay Spinuzzi. 2005. The methodology of participatory design. Technical communication 52, 2 (2005), 163–174.
  28. Michael Suguitan and Guy Hoffman. 2019. Blossom: A handcrafted open-source robot. ACM Transactions on Human-Robot Interaction (THRI) 8, 1 (2019), 1–27.
  29. The role of physical embodiment of a therapist robot for individuals with cognitive impairments. In RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 103–107.
  30. Tony Toneatto and Linda Nguyen. 2007. Does mindfulness meditation improve anxiety and mood symptoms? A review of the controlled research. The Canadian Journal of Psychiatry 52, 4 (2007), 260–266.
  31. Jürgen Trouvain and Bernd Möbius. 2020. Speech synthesis: text-to-speech conversion and artificial voices. Handbook of the Changing World Language Map (2020), 3837–3851.
  32. Oytun Turk and Marc Schroder. 2010. Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE Transactions on Audio, Speech, and Language Processing 18, 5 (2010), 965–973.
  33. Speech synthesis evaluation—state-of-the-art assessment and suggestion for a novel research program. In Proceedings of the 10th Speech Synthesis Workshop (SSW10).
  34. Probability and statistics for engineers and scientists. Vol. 5. Macmillan New York.
  35. Are we using enough listeners? No! An empirically-supported critique of Interspeech 2014 TTS evaluations. In INTERSPEECH 2015 (INTERSPEECH). International Speech Communication Association, 3476–3480. Interspeech 2015 ; Conference date: 06-09-2015 Through 09-09-2015.
  36. The effect of robot-guided meditation on intra-brain EEG phase synchronization. In Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. 318–322.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhonghao Shi (14 papers)
  2. Han Chen (53 papers)
  3. Anna-Maria Velentza (7 papers)
  4. Siqi Liu (94 papers)
  5. Nathaniel Dennler (16 papers)
  6. Allison O'Connell (3 papers)
  7. Maja Matarić (35 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.