Create a Video View Paper

GPC: Large-Scale Generative Pretraining for Transferable Motor Control

This presentation explores a breakthrough in physics-based character animation where discrete skill quantization and transformer-based modeling enable controllers to learn from hundreds of hours of motion data. The framework achieves remarkable motion fidelity, emergent robustness to physical perturbations, and efficient adaptation to new tasks through parameter-efficient fine-tuning, fundamentally advancing how we build generalizable motor control systems.

Script

Training a physics-based character to move naturally has always required trading off between two painful choices: either hand-craft controllers for every single skill, or watch continuous latent representations collapse into repetitive, robotic motions. This paper introduces Generative Pretrained Controllers, a framework that learns from over 600 hours of motion data to produce controllers that are both remarkably faithful to natural movement and genuinely robust to the unexpected.

The authors tackle the core instability problem by replacing continuous latent spaces with Finite Scalar Quantization. Instead of learning an explicit codebook that can suffer from dead codes or mode collapse, FSQ rounds encoder outputs to fixed discrete levels, creating a stable skill space trained end-to-end with reinforcement learning. The result is a 99.98% success rate in reproducing motion clips, from routine walking to explosive acrobatics.

The generative controller itself is a transformer trained to autoregressively predict grouped skill tokens, exactly like next-token prediction in language models. At each timestep, the model samples from a distribution of possible skills conditioned on the character's state and recent history. This design unlocks something unexpected: when you sample unconditionally from the controller, it produces a vast repertoire of dynamic behaviors, including jumps, leaps, and rolls, without any task-specific prompting.

Perhaps most striking is the emergent robustness the controller exhibits without any explicit training for it. When subjected to strong external pushes, the generative controller produces natural recovery strategies, diverse responses, and maintains balance, while baseline continuous methods collapse into failure. This robustness arises directly from the diversity and structure captured in the discrete latent space, not from hand-engineered recovery heuristics.

Adapting to new tasks requires remarkably few parameters. The authors insert lightweight Conditional Low-Rank Adaptation layers, adding less than 1% to model size, and fine-tune with task-specific reinforcement learning. Compared to continuous variational priors that collapse to near-identical solutions, the adapted generative controller retains behavioral diversity, produces varied trajectories for the same goal, and preserves the natural motion quality learned during pretraining.

Generative Pretrained Controllers demonstrate that discrete skill tokenization and autoregressive modeling can unlock scalable, high-fidelity motor control that genuinely generalizes. By learning a structured space of reusable skills from massive motion datasets, the framework opens a path toward general-purpose controllers for animation, robotics, and embodied agents. To dive deeper into this work and create your own research videos, visit EmergentMind.com.