Towards Learning How to Properly Play UNO with the iCub Robot
(1908.00744v1)
Published 2 Aug 2019 in cs.HC, cs.AI, and cs.RO
Abstract: While interacting with another person, our reactions and behavior are much affected by the emotional changes within the temporal context of the interaction. Our intrinsic affective appraisal comprising perception, self-assessment, and the affective memories with similar social experiences will drive specific, and in most cases addressed as proper, reactions within the interaction. This paper proposes the roadmap for the development of multimodal research which aims to empower a robot with the capability to provide proper social responses in a Human-Robot Interaction (HRI) scenario.
This paper outlines a research roadmap for developing a cognitive framework enabling the iCub humanoid robot to engage in socially appropriate behavior while playing the card game UNO with multiple human participants (Barros et al., 2019). The primary objective is to equip the robot with the capacity to learn and execute contextually relevant social responses, moving beyond simple mimicry or predefined scripts towards adaptive behaviors informed by both game strategy and the affective dynamics of the interaction.
Cognitive Framework Architecture
The proposed solution centers on a hybrid neural cognitive architecture integrated within the iCub platform. This architecture is conceptualized with two primary interconnected modules: an Adaptive Affective Modeling module responsible for perception and interpretation of multimodal social cues, and a Modulating Game Strategies module responsible for action selection, encompassing both game moves and accompanying affective expressions. The interaction scenario involves the iCub playing UNO against three human players, requiring the robot to process individual and group affective states, make strategic game decisions, and generate appropriate socio-affective responses in real-time. The choice of UNO is predicated on its controlled, turn-based nature, competitive aspects that elicit rich affective cues, and requirement for understanding both individual player states and group dynamics.
Adaptive Affective Modeling Module (Perception)
This module aims to overcome limitations in existing emotion recognition systems, particularly their lack of rapid online adaptation to individual expressive styles. It builds upon prior work utilizing hybrid deep/unsupervised networks, specifically Growing When Required (GWR) networks, to construct "affective memories" that learn and store prototypes of individual emotional expressions incrementally.
Key planned enhancements include:
Multimodal Temporal Integration: Employing Gated Recurrent Units (GRUs) to process and fuse asynchronous multimodal data streams (facial expressions, body language, speech prosody) from multiple individuals over time, capturing the temporal evolution of affective states.
Attention Mechanisms: Incorporating local and global attention mechanisms within convolutional channels, potentially augmented by recurrent pooling layers, to selectively focus on salient affective cues and mitigate the dilution effect of neutral expressions during extended interactions.
Group Affect Representation: Developing methods to synthesize individual affective state estimations into a coherent representation of the group's overall emotional tenor. This may involve introducing recurrent gamma connections within the GWR architecture to model the influence of individual affective states on the collective group dynamic.
The overarching goal of this module is to achieve robust, online adaptation to the idiosyncratic affective expressions of each human player while simultaneously modeling the evolving affective context of the group interaction.
This module focuses on learning appropriate behavioral responses that balance game objectives with social considerations. It leverages an actor-critic Reinforcement Learning (RL) framework.
Actor Network: This network is responsible for policy generation. It maps the current perceived group affective state and the specific affective memory associated with the relevant player(s) to a composite output. This output includes both a discrete game-related action (e.g., play card X, draw card, call UNO) and a continuous or discrete affective behavior to be expressed by the robot (e.g., facial expression parameters, body posture adjustments).
Critic Network: This network evaluates the actions selected by the actor. It assesses the chosen action's impact based on the subsequent observed affective reactions from the human players and the change in the game state. This evaluation, relative to predefined objectives, generates the reward signal used for learning.
The framework allows for different behavioral objectives to be pursued by tailoring the reward function:
Maximize Game Performance: Reward based primarily on game progress and winning.
Maximize Player Engagement: Reward based on maintaining positive group affect or eliciting specific desired affective responses.
Exhibit Human-like Play: Reward based on a combination of game performance and social appropriateness, potentially learned from human gameplay data or defined heuristics.
Learning and Adaptation Strategy
A significant challenge addressed is the sample inefficiency of standard RL algorithms in the context of real-time HRI. To mitigate this, several strategies are proposed:
Predictive Learning: Investigating the integration of predictive learning mechanisms to enable the system to continually learn and adapt from the limited interaction data available during gameplay.
Replay Memory Augmentation: Creating an extensive replay memory populated with perception-action-reaction sequences extracted from video recordings of humans playing UNO. This pre-existing dataset can be used for offline training and to augment online learning, bootstrapping the learning process and providing a richer set of examples than achievable solely through live interaction.
Exploration-Exploitation Balance: Developing sophisticated exploration strategies that effectively balance leveraging the knowledge encoded in the replay memory (exploitation) with exploring novel actions within the live interaction context (exploration) to facilitate adaptation and avoid overfitting to potentially biased offline data.
The system is designed for online learning, allowing the robot's affective perception and behavioral strategies to adapt over the course of interactions with specific individuals and groups.
Proposed Evaluation Protocol
The development and validation of the proposed framework involve a multi-stage evaluation process:
Offline Evaluation: Utilizing simulations, pre-recorded datasets (including the human UNO gameplay videos), and the iCub simulator (e.g., Gazebo) for initial development, parameter tuning, and optimization of the models and learning algorithms. Objective metrics such as emotion recognition accuracy, concordance correlation coefficient against human annotations, processing latency, and RL convergence rates will be employed.
Real-World Evaluation: Conducting HRI experiments with the physical iCub robot interacting with groups of human participants over multiple UNO game sessions. This phase assesses the framework's real-time performance, adaptability, and robustness in a realistic setting. Evaluation will combine objective measures (game outcomes, robot response times, physiological data if available) with subjective measures collected via established HRI questionnaires, such as Asch's personality impression scale and the Godspeed questionnaire series, to gauge perceived robot personality, anthropomorphism, animacy, likeability, and perceived intelligence.
This comprehensive evaluation aims to assess the framework's effectiveness in achieving the target behaviors (winning, engagement, human-like play) and its impact on the quality of the human-robot interaction.
In conclusion, this paper proposes a detailed research plan integrating advanced techniques in affective computing, multimodal interaction, and reinforcement learning to develop a robot capable of learning socially appropriate behaviors within a complex, dynamic multi-party HRI scenario. The focus is on creating adaptive, context-aware responses that consider both strategic and social dimensions of the interaction, paving the way for more nuanced and acceptable social robots.