Predicting and Understanding Turn-Taking Behavior in Open-Ended Group Activities in Virtual Reality (2407.02896v2)

Published 3 Jul 2024 in cs.HC and cs.CY

Abstract: In networked virtual reality (VR), user behaviors, individual differences, and group dynamics can serve as important signals into future speech behaviors, such as who the next speaker will be and the timing of turn-taking behaviors. The ability to predict and understand these behaviors offers opportunities to provide adaptive and personalized assistance, for example helping users with varying sensory abilities navigate complex social scenes and instantiating virtual moderators with natural behaviors. In this work, we predict turn-taking behaviors using features extracted based on social dynamics literature. We discuss results from a large-scale VR classroom dataset consisting of 77 sessions and 1660 minutes of small-group social interactions collected over four weeks. In our evaluation, gradient boosting classifiers achieved the best performance, with accuracies of 0.71--0.78 AUC (area under the ROC curve) across three tasks concerning the "what", "who", and "when" of turn-taking behaviors. In interpreting these models, we found that group size, listener personality, speech-related behavior (e.g., time elapsed since the listener's last speech event), group gaze (e.g., how much the group looks at the speaker), as well as the listener's and previous speaker's head pitch, head y-axis position, and left hand y-axis position more saliently influenced predictions. Results suggested that these features remain reliable indicators in novel social VR settings, as prediction performance is robust over time and with groups and activities not used in the training dataset. We discuss theoretical and practical implications of the work.

Summary

The paper demonstrates that using gradient boosting classifiers, turn-taking in VR is predicted with AUC scores between 0.71 and 0.78.
The study harnessed speech patterns, individual traits, and detailed motion tracking data to model turn-taking behavior across diverse group activities.
The findings support developing adaptive VR systems that enhance natural communication and assist users with sensory disabilities.

Predicting and Understanding Turn-Taking Behavior in Open-Ended Group Activities in Virtual Reality

The paper "Predicting and Understanding Turn-Taking Behavior in Open-Ended Group Activities in Virtual Reality" explores the dynamics of turn-taking behaviors within the context of VR, aiming to elucidate and forecast speech behavior patterns during open-ended group sessions. Conducted by Wang et al. at Stanford University, the research leverages data from 77 VR sessions spanning 1660 minutes, focusing on social interactions among university students partaking in diverse activities over a four-week period. The significance of this paper lies in its potential to inform real-time intervention mechanisms and enhance the naturalness of virtual interactions, benefiting various users, including those with sensory disabilities.

Research Aims and Methodology

The paper sets out to address three primary research questions:

Can turn-taking behaviors in VR group activities be predicted using features extracted from individual- and group-level differences, speech-related behavior, and motion tracking data?
How does prediction performance vary when evaluated on groups, activities, and time not seen during training?
What features strongly influence model performance and prediction of turn-taking behavior?

The methodology involves formulating features based on literature in social dynamics, examining VR tracking data, and deploying machine learning models to predict turn-taking behaviors. Features include speech sequences, individual personalities, group characteristics, and detailed motion tracking data (e.g., head and hand movements, gaze).

Results and Model Performance

The paper's results reveal that gradient boosting classifiers achieve the highest predictive performance, with accuracies of 0.71 to 0.78 AUC across the three tasks: distinguishing turn-taking from continuing speech, predicting the next speaker, and identifying the timing of turn-taking behaviors. Notably, these features remain reliable indicators when predictions are made on unseen groups, activities, and over different time periods, demonstrating their robustness.

Feature Importance and Interpretations

Feature importance analysis highlights several key predictors:

Speech-Related Features: The time elapsed since the listener’s last speech event significantly influences predictions, reinforcing the importance of recent speech activities in foretelling turn-taking.
Individual- and Group-Level Differences: Listener personality traits, particularly extraversion, as well as group size, are salient predictors. Extroverted individuals and those in smaller groups are more likely to speak next.
Motion Tracking Data: Egocentric motion data, such as head yaw rotation, head y-axis position, and hand y-axis position, significantly affect predictions. For instance, a listener’s likelihood to speak increases if they exhibit head and hand movements upwards in the y-axis.

Practical Implications

The implications of this research are manifold:

Adaptive Assistance: The predictive models can instantiate virtual agents with natural turn-taking behaviors, enhancing user experience in social VR environments.
Real-Time Interventions: The insights from turn-taking predictions can aid in developing systems that minimize communication breakdowns, such as overlapping speech, thereby assisting users with sensory disabilities.
Future Developments in AI: Further improvements can be made by integrating verbal transcripts or advanced machine learning architectures like GNNs and LSTMs to capture nuanced social dynamics.

Conclusions

This paper contributes to the understanding of social dynamics in immersive virtual environments by demonstrating the feasibility and robustness of predicting turn-taking behaviors using detailed VR tracking data and individual and group characteristics. The findings underscore the potential for real-time, adaptive systems that can significantly improve user interactions in VR, making it a valuable resource for researchers and practitioners aiming to enhance VR communication platforms. Future studies should continue exploring predictive modeling across broader demographic samples and other VR platforms to validate and extend these findings.

PDF Markdown

Related Papers

Tweets

https://twitter.com/XrDigest/status/1810252740779483470