Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts (2205.04952v3)
Abstract: How should a robot speak in a formal, quiet and dark, or a bright, lively and noisy environment? By designing robots to speak in a more social and ambient-appropriate manner we can improve perceived awareness and intelligence for these agents. We describe a process and results toward selecting robot voice styles for perceived social appropriateness and ambiance awareness. Understanding how humans adapt their voices in different acoustic settings can be challenging due to difficulties in voice capture in the wild. Our approach includes 3 steps: (a) Collecting and validating voice data interactions in virtual Zoom ambiances, (b) Exploration and clustering human vocal utterances to identify primary voice styles, and (c) Testing robot voice styles in recreated ambiances using projections, lighting and sound. We focus on food service scenarios as a proof-of-concept setting. We provide results using the Pepper robot's voice with different styles, towards robots that speak in a contextually appropriate and adaptive manner. Our results with N=120 participants provide evidence that the choice of voice style in different ambiances impacted a robot's perceived intelligence in several factors including: social appropriateness, comfort, awareness, human-likeness and competency.
- H. A. C. Maruri, S. Aslan, G. Stemmer, N. Alyuz, and L. Nachman, “Analysis of contextual voice changes in remote meetings,” in Interspeech, 2021, pp. 2521–2525.
- I. Torre, A. B. Latupeirissa, and C. McGinn, “How context shapes the appropriateness of a robot’s voice,” in ROMAN, 2020, pp. 215–222.
- A. Henschel, G. Laban, and E. Cross, “What makes a robot social? a review of social robots from science fiction to a home or hospital near you,” Curr. Robot. Rep., vol. 2, 2021.
- A. R. Bradlow, “Confluent talker-and listener-oriented forces in clear speech production,” Laboratory phonology, vol. 7, pp. 241–273, 2002.
- D. Pelegrin-Garcia, B. Smits, J. Brunskog, and C.-H. Jeong, “Vocal effort with changing talker-to-listener distance in different acoustic environments.” J. Acoust. Soc, vol. 129 4, pp. 1981–90, 2011.
- V. Hazan and R. Baker, “Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions,” J. Acoust. Soc, vol. 130, pp. 2139–52, 2011.
- E. Lombard, “Le signe de l’élévation de la voix,” Ana. d. Mal. de L’Oreillexdu du larynx [etc], vol. 37, pp. 101–119, 1911.
- D. Burnham, C. Kitamura, and U. Vollmer-Conna, “What’s new, pussycat? on talking to babies and animals,” Science, vol. 296, p. 1435, 2002.
- C. Lam and C. Kitamura, “Mommy, speak clearly: induced hearing loss shapes vowel hyperarticulation.” Dev. Sci., vol. 15, no. 2, pp. 212–21, 2012.
- C. Mayo, V. Aubanel, and M. Cooke, “Effect of prosodic changes on speech intelligibility,” in Interspeech, vol. 2, 2012.
- J. A. Caballero, N. Vergis, X. Jiang, and M. D. Pell, “The sound of im/politeness,” Speech Commun., vol. 102, pp. 39–53, 2018.
- A. Matsufuji and A. Lim, “Perceptual effects of ambient sound on an artificial agent’s rate of speech,” in Companion of HRI, 2021, pp. 67–70.
- Y. Okuno, T. Kanda, M. Imai, H. Ishiguro, and N. Hagita, “Providing route directions: Design of robot’s utterance, gesture, and timing,” in HRI, 2009, pp. 53–60.
- A. Hönemann and P. Wagner, “Adaptive speech synthesis in a cognitive robotic service apartment: An overview and first steps towards voice selection,” in ESSV, 2015.
- S. J. Sutton, P. Foulkes, D. Kirk, and S. Lawson, “Voice as a design material: Sociophonetic inspired design strategies in human-computer interaction,” in CHI, 2019, p. 1–14.
- N. Lubold, E. Walker, and H. Pon-Barry, “Effects of voice-adaptation and social dialogue on perceptions of a robotic learning companion,” in HRI, 2016, pp. 255–262.
- K. Fischer, L. Naik, R. M. Langedijk, T. Baumann, M. Jelínek, and O. Palinko, “Initiating human-robot interactions using incremental speech adaptation,” in Companion of HRI, 2021.
- A. Hayamizu, M. Imai, K. Nakamura, and K. Nakadai, “Volume adaptation and visualization by modeling the volume level in noisy environments for telepresence system,” in HAI. ACM, 2014.
- P. Arias, P. Belin, and J.-J. Aucouturier, “Auditory smiles trigger unconscious facial imitation,” Current Biology, vol. 28, no. 14, pp. R782–R783, 2018.
- A. Castellanos, J.-M. Benedí, and F. Casacuberta, “An analysis of general acoustic-phonetic features for spanish speech produced with the lombard effect,” Speech Commun., vol. 20, no. 1, pp. 23–35, 1996.
- H. F. Wertzner, S. Schreiber, and L. Amaro, “Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders,” Braz j. of otorh, vol. 71, no. 5, pp. 582–588, 2005.
- T. IR and P. A, “Vocal loudness variation with spectral slope,” J. Speech Lang. Hear, vol. 63, no. 1, pp. 74–82, 2020.