Human-Like Embodied AI Interviewer: Employing Android ERICA in Real International Conference (2412.09867v1)

Published 13 Dec 2024 in cs.CL and cs.HC

Abstract: This paper introduces the human-like embodied AI interviewer which integrates android robots equipped with advanced conversational capabilities, including attentive listening, conversational repairs, and user fluency adaptation. Moreover, it can analyze and present results post-interview. We conducted a real-world case study at SIGDIAL 2024 with 42 participants, of whom 69% reported positive experiences. This study demonstrated the system's effectiveness in conducting interviews just like a human and marked the first employment of such a system at an international conference. The demonstration video is available at https://youtu.be/jCuw9g99KuE.

PDF HTML Abstract

Overview of the Human-Like Embodied AI Interviewer

The advent of human-like embodied AI systems capable of complex conversational engagement has marked significant progress in the field of human-computer interaction. The paper "Human-Like Embodied AI Interviewer: Employing Android ERICA in Real International Conference" presents a sophisticated AI-based interviewing system that operationalizes an android robot named ERICA, featuring advanced dialogue capabilities such as attentive listening and conversational repair. The development and deployment of such systems aim to tackle the time-intensity and labor demands inherent in qualitative social science research, where capturing nuanced human interactions is paramount.

The proposed system stands out by integrating several functionalities that enhance interaction quality, including user fluency adaptation and an automated post-interview processing workflow utilizing LLMs. A real-world implementation at the SIGDIAL 2024 conference involved 42 participants, with 69% expressing positive feedback, thus underscoring the system's utility and effectiveness.

System Architecture and Features

The architecture of this human-like interview system incorporates several key components:

Speech Processing: A dedicated speech module captures and processes the input, utilizing real-time ASR technology to extract prosodic features such as frequency and power. This facilitates nuanced understanding and interaction.
Dialogue Management: Central to the system, the dialogue manager interprets and generates responses through modules that predict backchannel cues and repair conversation breakdowns when necessary.
User Fluency Adaptation: The system adjusts its conversational pace and response times based on the user's speech fluency, which is essential for engaging non-native speakers effectively.
Post-Interview Analysis: Through a chain of LLMs, the system performs data processing, summarization, and presentation creation, thus automating the analytic phase post-interview.

These elements are implemented using an ECA, ERICA, renowned for its human-like physical attributes, and TELECO, a teleoperated humanoid platform, to evaluate the interaction efficacy across different design aesthetics.

Empirical Evaluation and Implications

Conducted at the SIGDIAL 2024 conference, the case paper evaluation revealed a predominantly positive reception, with participants noting the engaging nature of interactions facilitated by ERICA. This paper not only demonstrated the technological feasibility of such systems in real-world settings but also highlighted varied user preferences in robot interaction. However, some participants perceived the system's repetitive question nature and ERICA's human-like appearance as unsettling, reflecting the nuanced challenges in designing AI systems that balance anthropomorphic features with user comfort.

The implications of this research are manifold, extending both theoretical and practical understanding of human-like AI systems. The findings advocate for enhancing the adaptability of AI-generated questions to improve conversational engagement and the importance of fine-tuning robot aesthetics to cater to diverse user preferences.

Future Directions

Future research will focus on refining the systems' conversational dynamism through advanced LLM integration, thus overcoming the limitations imposed by static question templates. Additionally, exploring the impact of multimodal inputs, including visual and contextual cues, could significantly elevate the system's interactive capabilities, making AI interlocutors more attuned to human conversational subtleties.

In conclusion, this paper presents a meaningful step towards realizing AI systems that not only perform with efficiency but also engage with users at a human-like level of interaction, providing a promising pathway for enhancing data collection methodologies and human-computer interaction paradigms.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Zi Haur Pang (4 papers)
Yahui Fu (8 papers)
Divesh Lala (16 papers)
Mikey Elmers (6 papers)
Koji Inoue (28 papers)
Tatsuya Kawahara (61 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/inokoj/status/1868513030251933912

https://twitter.com/zihaurpang/status/1868536306542018948