Dexterous Robotic Piano Playing at Scale (2511.02504v1)

Published 4 Nov 2025 in cs.RO

Abstract: Endowing robot hands with human-level dexterity has been a long-standing goal in robotics. Bimanual robotic piano playing represents a particularly challenging task: it is high-dimensional, contact-rich, and requires fast, precise control. We present OmniPianist, the first agent capable of performing nearly one thousand music pieces via scalable, human-demonstration-free learning. Our approach is built on three core components. First, we introduce an automatic fingering strategy based on Optimal Transport (OT), allowing the agent to autonomously discover efficient piano-playing strategies from scratch without demonstrations. Second, we conduct large-scale Reinforcement Learning (RL) by training more than 2,000 agents, each specialized in distinct music pieces, and aggregate their experience into a dataset named RP1M++, consisting of over one million trajectories for robotic piano playing. Finally, we employ a Flow Matching Transformer to leverage RP1M++ through large-scale imitation learning, resulting in the OmniPianist agent capable of performing a wide range of musical pieces. Extensive experiments and ablation studies highlight the effectiveness and scalability of our approach, advancing dexterous robotic piano playing at scale.

Summary

The paper presents OmniPianist, a novel RL-based agent that uses optimal transport for autonomous fingering in piano playing.
It leverages the RP1M++ dataset and a Flow Matching Transformer to achieve robust performance and generalization across music pieces.
The approach advances robotic dexterity by automating key finger movements and improving training efficiency for complex musical tasks.

Dexterous Robotic Piano Playing at Scale

Introduction

The paper "Dexterous Robotic Piano Playing at Scale" (2511.02504) introduces an innovative approach to robotic piano playing that addresses the challenges of dexterity and scalability in robotics. It presents OmniPianist, an agent capable of performing a vast repertoire of music pieces without human demonstrations by leveraging a novel Reinforcement Learning (RL) framework. This work is a significant step forward in enabling robots to achieve human-level dexterity in complex tasks.

Core Components of the Method

The research introduces three key innovations:

Automatic Fingering Strategy: Utilizing an Optimal Transport (OT) formulation, the proposed method eliminates the dependency on human-annotated fingering. This allows the robotic agent to autonomously discover efficient piano-playing strategies (Figure 1).
Figure 1: Overview of the proposed RL-based agent using OT formulation for autonomous piano playing.
Large-Scale Dataset - RP1M++: The collection and utilization of a comprehensive dataset, RP1M++, comprising over one million trajectories from specialist RL agents trained on distinct music pieces, ensures extensive coverage and diversity necessary for robust performance (Figure 2).

Figure 2: Dataset statistics showing diversity and coverage in RP1M++.

Flow Matching Transformer: Employed for imitation learning, this model processes the RP1M++ data to produce a skilled agent capable of diverse performances with strong generalization to new songs (Figure 3).
Figure 3: Flow Matching Transformer architecture used in the OmniPianist.

Implementation Details

Optimal Transport for Fingering

The OT problem formulation dynamically determines how fingers should move to press the required keys at each time step, significantly reducing the exploration space in RL. This fingering strategy adapts well to different robotic architectures, allowing for the transferability across various robotic hands (Figure 4 and Figure 5).

Figure 4: Comparison of agent-discovered and human-annotated fingering.

Figure 5: Performance across different robotic embodiments.

Reinforcement Learning with RP1M++

The OT-based fingering enables efficient training of RL specialists, each specializing in a single musical piece. The RP1M++ dataset aggregates over 2,000 such specialists, improving both data diversity and state-space coverage essential for training a multi-song agent capable of generalizing to unseen pieces (Figure 6).

Figure 6: RP1M++ provides better coverage leading to superior training results.

Flow Matching Transformer

This transformer model efficiently processes complex music sequences. It utilizes multi-head attention to handle the temporal and spatial dependencies inherent in piano playing, thereby improving the agent's ability to perform diverse tasks with high fidelity (Figure 3).

Performance and Evaluation

The method's scalability is demonstrated by training and evaluating the multi-song agent, OmniPianist, across various datasets. The performance metrics on both in-distribution and out-of-distribution sets underscore the improvements brought by RP1M++, affirming the method's robustness and generalization capacity (Figure 7).

Figure 7: Data scaling laws showing consistent improvements with larger datasets.

Conclusion

The paper presents a scalable framework for dexterous robotic piano playing that bypasses the limitations of traditional human-demonstration-based methods. By integrating optimal transport-based fingering with large-scale RL and the Flow Matching Transformer, OmniPianist demonstrates remarkable proficiency and versatility in performing complex musical tasks. Future work may explore extending this framework to real-world implementations and integrating multimodal sensory inputs to enhance precision and control during performances.