- The paper demonstrates a robotic system that learns to play piano using a Sim2Real approach with reinforcement learning.
- It integrates a modified Allegro hand and UFACTORY xArm7 with 3D-printed fingertips to accurately emulate human piano playing.
- Experimental results reveal that the hybrid execution mode achieves the best performance in real-world precision, recall, and F1 scores.
Learning To Play Piano in the Real World
Introduction
The paper "Learning To Play Piano in the Real World" introduces a robotic system capable of learning to play piano pieces on a real piano using a Sim2Real approach. This approach involves training a policy in simulation with reinforcement learning, subsequently transferring it to a real-world dexterous robotic hand. The research emphasizes the challenges and solutions for achieving human-like manipulation precision and coordination, which are critical for piano playing.
Sim2Real Transfer and Hardware Configuration
The paper utilizes the Allegro hand attached to a UFACTORY xArm7, with modifications to accommodate piano playing, such as replacing fingertips with 3D-printed tips to match piano key dimensions.
Figure 1: In this work, we demonstrate a proof-of-concept for learning to play piano with a real world robot. To achieve this, we employed a multi-finger robot hand and a Sim2Real approach. Experimental results show that the robot can learn to play several simple pieces successfully, after training exclusively in simulation.
Simulation Environment
The use of the Mujoco physics engine enables the simulation of the Allegro hand and an M-Audio Keystation piano model. The task is modeled as a partially observable Markov Decision Process, and DroQ, a variation of soft actor-critic RL, is utilized to train policies.

Figure 2: Simulated hand and piano.
Execution Modes
Three execution modes are introduced to transition learned policies to the real world: Joint Mirroring, Hybrid Execution, and Real World Execution.
Figure 3: The diagram compares the three execution modes: A) In joint mirroring, the whole observation space is obtained from the simulated environment. B) In hybrid execution, only the pressed keys are based on the real world, while everything else is simulated. C) In real world execution, all observations are based on the real world.
Evaluation Metrics and Experimental Results
The research employs precision, recall, and F1 scores as performance metrics. Experimental results suggest that while a Sim2Real gap exists, the hybrid execution mode shows the most promising results across multiple simple songs.

Figure 4: Comparison of several songs in the real world using hybrid execution.
Impact of Domain Randomization
Domain randomization proves critical for robustness, specifically impacting recall in real-world settings by allowing the agent to adapt to unanticipated variations.

Figure 5: The diagram shows the effect of DR on the performance in simulation.
Conclusion
This research demonstrates a significant step toward employing Sim2Real transfer for complex manipulation tasks, using piano playing as a benchmark to enhance robotic dexterity mapping to real-world applications. However, challenges remain, such as the need for tactile sensors and improved song generalization, highlighting future research directions.