Conditional Adversarial Motion Priors (CAMP)

Updated 30 September 2025

Conditional Adversarial Motion Priors (CAMP) are learning-based techniques that extend traditional adversarial motion priors by incorporating explicit skill and contextual conditioning.
By integrating conditional inputs into both the policy and discriminator networks, CAMP ensures physically plausible, diverse, and controllable motion synthesis in robotic and character locomotion.
CAMP frameworks enable multi-skill learning with smooth transitions, achieving high performance in simulated and real environments through robust reward design and conditioning strategies.

Conditional Adversarial Motion Priors (CAMP) are a class of learning-based motion modeling techniques that extend Adversarial Motion Priors (AMP) with explicit conditioning, enabling robust, diverse, and controllable motion synthesis and policy training for robotic and character locomotion. CAMP frameworks integrate conditional information—such as skill labels, target parameters, or environmental context—into the adversarial imitation learning loop, aligning generated motions with user-specified requirements while ensuring physical plausibility and facilitating generalizable multi-skill control.

1. Foundational Principles and Adversarial Conditioning

CAMP builds upon the canonical AMP framework, which utilizes a generative adversarial setup: a policy or motion generator produces state trajectories, and a discriminator network is trained to distinguish these from real motions, such as those sourced from motion capture or trajectory optimization datasets. In CAMP, both the generator and discriminator receive additional conditional information $c$ (sometimes denoted as $z^p$ or $g^t$ ), which specifies the desired motion style, skill, or task variant (Huang et al., 26 Sep 2025). This transforms the binary discrimination of AMP into a conditional discrimination problem: the discriminator must both verify realism and consistency with the intended skill or context.

Mathematically, the discriminator learns a mapping

$D_\theta(s_t, s_{t+1} \mid z^p)$

where $(s_t, s_{t+1})$ are state transitions and $z^p$ is a skill or context latent. The GAN-style loss typically follows a least-squares formulation with a gradient penalty for stability: $\underset{\theta}{\operatorname{min}} \ \mathbb{E}_{\mathcal{D}_\mathcal{M}}[(D_\theta(s_{t}, s_{t+1} \mid z^p) - 1)^2] + \mathbb{E}_{\mathcal{D}^{(\pi)}}[(D_\theta(s_{t}, s_{t+1} \mid z^p) + 1)^2] + \omega_{gp} \mathbb{E}_{\mathcal{D}_\mathcal{M}}[\|\nabla_\theta D_\theta(s_{t}, s_{t+1} \mid z^p)\|^2]$ The resulting adversarial reward assigned to the generator is a function of the discriminator output, e.g.,

$r^{style}_t = \max[0, 1 - 0.25 \times (D_\theta(s_t, s_{t+1} \mid z^p) - 1)^2]$

Such structure ensures that policy training is guided towards both realism and conditional compliance.

2. Skill Conditioning and Latent Space Structure

CAMP frameworks employ explicit conditioning vectors that encode desired attributes of motion, enabling the policy to produce not just physically plausible trajectories, but also motions corresponding to different skills or contextual requirements. Typical conditioning mechanisms include one-hot vectors or continuous latent skill embeddings, which are concatenated with state observations and/or provided as input to both the actor and the conditional discriminator (Huang et al., 26 Sep 2025, Wu et al., 2023). In some frameworks, a dedicated skill embedding network $E(\cdot)$ is trained to map discrete skill labels or context descriptors into a continuous latent space, which facilitates interpolation and blending between skills.

A key addition is the skill discriminator $D_{skill}$ . This module maps state transition sequences to a latent skill embedding $ẑ$ and compares it to the ground-truth skill representation $z$ , typically using cosine similarity: $r^{(skill)} = \frac{ẑ \cdot z}{\|ẑ\| \cdot \|z\|}$ This tightens the match between produced motion and target skill, providing sharper skill-specific learning signals.

The policy network thus learns to interpret and respond to skill-conditioning, resulting in a latent space with well-separated clusters for different skills but also smoothness enabling interpolation. Experimental validation via t-SNE and K-means clustering confirms the formation of distinct, skill-correlated latent regions (Huang et al., 26 Sep 2025).

3. Reward Design and Optimization

CAMP architectures integrate several reward terms in policy optimization. In addition to style or skill rewards from the conditional discriminator, task and contact rewards are formulated to promote robust performance. For instance, a general reward structure is: $r_t = \alpha_{style} \, r_t^{style} + \alpha_{skill} \, r_t^{(skill)} + \alpha_{task} \, r_t^{(task)} + \alpha_{contact} \, r_t^{(contact)}$

Style/Skill Rewards: Enforce realism and skill specificity via adversarial and latent-space objectives.
Task Rewards: Enforce high-level goals (e.g., traversing to a given target, velocity tracking).
Contact Rewards: Shape gait-foot contact patterns, penalize instability or slippage (commonly in quadruped or humanoid robots) (Wu et al., 2023).

The presence of gradient penalties in all adversarial modules is necessary to stabilize learning, especially with nontrivial conditioning.

4. Multi-Skill Acquisition and Smooth Transitions

A prominent property of CAMP systems is the ability to support multi-skill learning within a single unified policy network. Conditioned by a skill vector, the policy generates stylistically distinct motions corresponding to user-specified commands, such as trot, pace, bound, or pronk in quadruped robots (Huang et al., 26 Sep 2025). Because the same network covers all skill variants, transitions—either abrupt or smoothly interpolated between skills—are inherently learned by the policy, without requiring explicit transition models or mode scheduling (Wu et al., 2023).

Quantitative and qualitative evaluations show that when skill commands are switched online, feet contact diagrams and joint angle traces show continuous, natural transitions, and no catastrophic instability is observed. This is in contrast to hierarchical reinforcement learning (HRL) approaches, where skill boundaries can produce discontinuities unless specifically handled.

5. Implementation and Empirical Performance

Recent implementations of CAMP have demonstrated effective deployment on both simulated and real quadruped robotic platforms such as Unitree Go2 and Go1 (Huang et al., 26 Sep 2025, Wu et al., 2023). The architecture typically consists of:

Policy Network: Receives state/history features and the skill-conditioning vector.
Conditional Discriminator: Judging short transition segments $[s_t, s_{t+1}]$ together with $z^p$ .
Skill Discriminator: Mapping transitions to skill latent vectors for cosine-reward computation.

Domain randomization—over friction, motor gains, terrain, and morphology parameters—is employed to ensure sim-to-real transferability. Policies demonstrate >91% accuracy in joint position tracking and successful execution of multiple gaits and transitions in both moderate and rough terrain tests (Huang et al., 26 Sep 2025, Wu et al., 2023). Comparisons with methods lacking explicit conditional skill priors show higher robustness, reduced incidence of mode-collapse, and greater behavioral diversity.

6. Theoretical and Practical Impact

By incorporating conditional structure, CAMP resolves two major limitations of traditional adversarial motion imitation: the inability to disambiguate between multiple skills under the same task reward, and the mode collapse problem where only one or a few behaviors dominate. The practical impact is especially notable in settings demanding versatile robotic mobility—such as agile navigation of complex environments and user-driven gait switching. CAMP has also proven effective in generating controller policies that generalize well across physical platforms and environmental variations, owing to robust conditioning and adversarial regularization (Huang et al., 26 Sep 2025, Wu et al., 2023).

CAMP shares motivation with multi-task reinforcement learning and conditional generative modeling in computer vision and robotics, where conditioning enables a model to represent a family of solutions rather than fitting to an averaged trajectory. Related approaches have been used in video-based pose estimation, talking face generation, and motion denoising, though with domain-specific conditional signals (Chen et al., 2023, Shen et al., 13 Feb 2025). The underlying principle—using discriminator-driven, conditional regularization to propagate constraints and structure—remains similar, even as architectures vary across application domains.

A plausible implication is that as motion libraries grow and policy tasks become more complex, CAMP-like conditioning will play an increasing role in unifying skill learning, skill interpolation, and robust real-world deployment in robotics and animation.

Summary Table: Core Elements of CAMP (as reported in (Huang et al., 26 Sep 2025, Wu et al., 2023))

Component	Mathematical Role	Conditioning Variable(s)
Policy (Generator)	$\pi_\phi(a\|s, \, g^t)$	Skill vector $g^t$
Cond. Discriminator	$D_\theta(s_t, s_{t+1} \mid z^p)$	Skill latent $z^p$
Skill Discriminator	$f_\theta(s_t, s_{t+1}) \rightarrow \hat{z}$	Compares to target $z$
Reward	Style, skill (cosine similarity), task, and contact	All above

CAMP enables policies that are both expressive and controllable, integrating multi-skill capabilities and context-aware adaptation within an adversarial imitation and reinforcement learning framework.