Expressive Movement Primitives

Updated 9 July 2025

Expressive movement primitives are elemental motion segments capturing nuances like intention, emotion, and style in human and robotic behaviors.
They are derived using mathematical formulations such as motion flux maximization, dynamic movement primitives, and probabilistic models to segment and parameterize motion.
Hierarchical and compositional frameworks enable their integration into complex behaviors, driving realistic applications in robotics, animation, and interactive systems.

Expressive movement primitives are elemental units of motion used to decompose, analyze, recognize, and generate complex behaviors in humans and robots. Unlike purely functional primitives, expressive movement primitives are designed to convey or capture nuances of intention, emotion, style, or semantics embedded within the motion. They provide a low- or mid-level abstraction between raw sensory data (e.g., trajectories or joint angles) and higher-level behavior, enabling modular, adaptable, and interpretable representations for applications ranging from video analysis to robotics and human–robot interaction.

1. Mathematical Formulation and Segmentation of Expressive Primitives

Movement primitives are formalized as temporally bounded segments of motion that capture characteristic patterns in position, velocity, or higher-order motion features. A key methodology for discovering expressive primitives, especially in human activities, involves segmenting continuous motion by optimizing the "motion flux," a measure of total acceleration variation for a group of joints. For a joint group $G$ with velocities $v_j(t)$ and a direction $g$ , the motion flux over $[t_1, t_2]$ is:

$\Phi(t_2, t_1) = \sum_{j \in G} \int_{t_1}^{t_2} |\dot{v}_j(t) \cdot g| dt$

Primitives are extracted by maximizing an energy functional that rewards high flux (motion variability within the segment), penalizes nonstationary endpoints (requiring near-rest start/end), and regularizes segment length:

$P(\rho; t_0) = \Phi(\rho, t_0) - \frac{\beta_v}{2} \sum_{j \in G} [v_j(\rho)^2 + v_j(t_0)^2] + \beta_s \sum_{j \in G} [s_j(\rho) - s_j(t_0)]$

Segment boundaries correspond to local maxima or transition points in this functional, marking expressive units delimited by dynamic significance or "expressivity" in acceleration and velocity (Sanzari et al., 2017).

Primitives from other domains—such as driving—may be segmented by domain-specific signals (e.g., zero-crossings in course deviation) combined with probabilistic inference mechanisms, leading to mutual refinement of segmentation and primitive definition (Wang et al., 2018).

2. Representation, Normalization, and Parameterization

The shape and features of each movement primitive are typically parameterized to ensure expressive capability, robustness, and generalization. Several complementary approaches exist:

Differential geometric features: For human body primitives, each segment can be encoded by local curvature $\kappa(s)$ and torsion $\tau(s)$ , capturing the 3D shape up to rigid transformation. Feature vectors derived from decimated trajectories enable clustering and recognition (Sanzari et al., 2017).
Dynamic Movement Primitives (DMPs): DMPs represent movements as solutions to nonlinear dynamical systems. The standard DMP for position $y$ is:

$\tau^2 \ddot{y} = \alpha\left[\beta (g - y) - \tau \dot{y}\right] + f(x)$

where $f(x)$ encodes the nonlinear shape via basis function weights. DMPs naturally support modulation for duration, amplitude, and movement direction (Wang et al., 2018, Hielscher et al., 9 Apr 2025).

Probabilistic Movement Primitives (ProMPs): Represent trajectory distributions as weighted sums over basis functions, treating the weight vector $w$ as a random variable with learned mean and covariance. ProMPs encode both the typical motion and its stylistic variability, crucial for expressivity in transfer and adaptation (Stark et al., 2019).
Normalization: Expressive primitives must be invariant to anatomical differences or sampling rates. This is achieved by normalizing via scaling factors $k_G = 1/\ell_G$ , where $\ell_G$ is a group-specific limb length, making extracted features comparable across different subjects (Sanzari et al., 2017).

3. Hierarchical and Programmatic Composition

Expressive movement primitives are rarely used in isolation; higher-level behaviors emerge from their hierarchical or programmatic composition:

Nonparametric Bayesian clustering: Segment-wise feature vectors are hierarchically grouped into clusters using Dirichlet Process Mixture Models (DPMs), supporting an unknown number of underlying primitive types and allowing primitives to map to interpretable biomechanical classes (Sanzari et al., 2017).
Motion Programs and DSLs: Human motion can be abstracted as sequences of parametric primitives (e.g., linear, circular, stationary), organized into "programs" using constructs such as loops (for repeated actions) or conditional branches. These neuro-symbolic representations enable the encoding of long-range dependencies, structural semantics, and ease of editing (Kulal et al., 2021).
Mixture-of-Experts Models: For highly expressive tasks such as sign language production, motion is generated as a temporal mixture of learned primitive "experts," with gating networks dynamically blending primitives to generate fluid and expressive outputs (Saunders et al., 2021).

4. Recognition, Modulation, and Adaptation

Recognition of movement primitives in new data is performed by evaluating the likelihood of observed feature vectors under each class-specific model, with refinements through geometric similarity metrics. This yields probabilistic matching while maintaining geometric consistency:

$p(\{\mathcal{F}_{un}\}| \Theta_w) = \sum_{j=1}^{\rho} \pi_j \prod_{n=1}^{q} \mathcal{N}(\mathcal{F}_{un} | \mu_{wj}, \Sigma_{wj})$

Modulation—adjusting intensity, duration, or style—relies on controlling representation parameters. DMPs, for example, can be modulated in real time by scaling system parameters or retraining forcing term weights, allowing for anticipation, timing, exaggeration, and other expressive animation principles (Hielscher et al., 9 Apr 2025).

Modern frameworks exploit deep generative models or Bayesian aggregators to handle context conditioning, blending, and adaptation to new via-points, thereby enabling responsive, context-aware expressive motion (e.g., DeepProMPs) (Przystupa et al., 2023).

5. Applications and Practical Implications

Expressive movement primitives underlie diverse real-world applications:

Human motion analysis: Foundational for activity recognition, behavior understanding, surveillance, and abnormality detection; enables automatic discovery of motion dictionaries and labeling of movement types (Sanzari et al., 2017, Kulal et al., 2021).
Robotics: Primitives enable modular skill libraries for human-like motion generation, imitation learning, and robust trajectory adaptation. The ability to compose, modulate, and blend primitives lets robots execute complex tasks with nuance and variability, including social expression (Wang et al., 2018, Li et al., 2022, Hielscher et al., 9 Apr 2025).
Sign language and animation: In fields requiring the conveyance of high-level meaning, expressive primitives allow for more natural, interpretable, and engaging sequence generation and translation between modalities (Saunders et al., 2021).
End-user programming: Puppeteering interfaces map human gestures to robot motions in real time, enabling intuitive creation and layering of expressive primitives for social robots (Wang et al., 2022).

6. Datasets, Evaluation, and Research Impact

Advanced segmentation and recognition frameworks have been used to curate publicly available datasets of expressive human motion primitives, extracted from established MoCap repositories. These datasets provide standardized benchmarks for behavior modeling, motion synthesis, and system comparison (Sanzari et al., 2017).

Quantitative evaluation covers segmentation quality, representation fidelity (e.g., mean squared trajectory error), and recognition accuracy, while user studies (in animation and interaction contexts) assess perceptual metrics such as engagement, naturalness, and expressive clarity (Hu et al., 21 Jan 2025, Hielscher et al., 9 Apr 2025).

Expressive movement primitives have driven progress in modular robotics, activity analysis, and animation, offering pathways for the synthesis and adaptation of nuanced behaviors across machine learning, video understanding, and HRI. Their programmatic, probabilistic, and hierarchical structure remains a focus for advancing generalizable, interpretable, and user-aligned motion systems.