Learning from Dynamics (LfD)

Updated 9 August 2025

Learning from Dynamics (LfD) is a framework that extracts control policies from temporal demonstrations using Bayesian methods, clustering techniques, and reinforcement learning for robust policy synthesis.
LfD integrates motion primitives, optimal control, and reservoir computing to capture dynamic, stochastic task properties and ensure rapid adaptation to new states.
Modern LfD approaches enhance safety and scalability by embedding uncertainty quantification, stability guarantees via Lyapunov functions, and lifelong, federated learning for real-world applications.

Learning from Dynamics (LfD), or Learning from Demonstration in dynamical settings, encompasses a spectrum of methodologies in which a system synthesizes behavioral or control policies directly from demonstrations, with an emphasis on explicitly leveraging temporal, structural, or stochastic properties inherent to the underlying dynamics of the task or the demonstrator. In contrast with standard imitation learning assumptions—optimality or determinism—contemporary LfD frameworks routinely address stochasticity, suboptimality, and the need for robust generalization to unseen states, often under strict stability, uncertainty, or safety constraints.

1. Foundational Bayesian and Clustering Approaches

An early and influential Bayesian formulation seeks to generalize LfD beyond assumptions of optimality or deterministic expert behavior by constructing the full posterior over possible expert local controllers without requiring knowledge of the reward function or explicit action monitoring (Šošić et al., 2016). The expert demonstration is formalized in the Markov Decision Process (MDP) framework:

Each state $i$ is governed by a local, potentially stochastic, controller parameterized by a categorical distribution $\theta_i \sim \mathrm{Dir}(\alpha \cdot \mathbf{1}_{|\mathcal{A}|})$ .
The posterior distribution over local controllers is Dirichlet after observing state–action counts $\phi_{i,j}$ : $\theta_i| \{a\} \sim \mathrm{Dir}(\phi_i + \alpha \cdot \mathbf{1}_{|\mathcal{A}|})$ .

To avoid overfitting in large or continuous state spaces, the state space is clustered:

Each state $i$ is assigned to a cluster $z_i$ , sharing its controller parameter $\theta_{z_i}$ .
Clustering priors incorporate spatial structure (e.g., Potts model), Dirichlet mixtures, or nonparametric Bayesian mechanisms, such as the Chinese Restaurant Process (CRP) or distance-dependent CRP (ddCRP).
Nonparametric models allow the number of clusters (“control situations”) to be determined by the data, leading to adaptive, task-appropriate state representations.

Sampling-based inference (e.g., Gibbs sampling) produces calibrated action predictions and enables principled uncertainty quantification.

<table> <thead> <tr><th>Aspect</th><th>Static Model</th><th>Nonparametric Model</th></tr> </thead> <tbody> <tr><td>Controller Assignment</td><td>Per-state $\theta_i$ </td><td>Cluster-wise $\theta_{z_i}$ </td></tr> <tr><td>State Generalization</td><td>Limited for new states</td><td>Via shared clusters</td></tr> <tr><td>Uncertainty Quantification</td><td>From Dirichlet posteriors</td><td>Enhanced via nonparametrics</td></tr> </tbody> </table>

This approach enables LfD even when demonstrations are suboptimal, noisy, or stochastic, and is particularly suited for system identification, multi-agent modeling, and behavioral monitoring.

2. Skill Generalization, Motion Primitives, and Optimal Control

Contemporary LfD systems increasingly exploit temporal structure, context, and optimal control theory. Dynamic Movement Primitives (DMPs), task-parameterized LfD (TP-LfD), and kernel-based movement primitives underpin scalable and generalizable motor skill reproduction.

Frame-Weighted Motion Generation (Sun et al., 2023) encodes skill relevance via reference frame weights parameterized by radial basis functions, allowing trajectory “warping” to generalize to novel object arrangements with as few as two demonstrations. These frame weights are optimized to minimize dynamic time warping (DTW) errors against observed demonstrations.
Logic-DMP (Zhang et al., 24 Apr 2024) integrates optimal control—expressed as a linear quadratic tracker with control primitives (LQT-CP)—and task and motion planning (TAMP) to handle long-horizon, dynamic manipulation tasks. The LQT-CP cost function

$c = (\mu - x)^\top Q (\mu - x) + u^\top R u$

underpins trajectory adaptation with via-point constraints and closed-loop feedback, ensuring robust adaptation to disturbances and logical re-planning.

Auto-LfD (Wu et al., 2023) introduces closed-loop evaluation metrics based on Siamese encoders that operate on trajectory features in a learned latent space, providing objective, shape-preserving generalization scores that guide automatic hyperparameter optimization (e.g., in DMPs or kernel methods).

These methods enable sample-efficient learning, rapid adaptation, and principled feedback-driven tuning in robotic applications.

3. Learning Robust, Stable, and High-Dimensional Dynamical Systems

Ensuring global stability and scalability is an active area of research in dynamical LfD. Several advances enable learning of stable vector fields directly from demonstration:

Stable Neural DS via Lyapunov Functions (Zhang et al., 2023): The learned system $\dot{x} = f(x)$ is stabilized by constructing a Lyapunov candidate $V(x) = \frac{1}{2} g(x)^\top g(x)$ directly via neural networks. The derivative $\dot{V}(x)$ is regulated through a residual structure and projected outputs to guarantee energy decrease, enforcing global convergence even when fitting complex demonstration data. Empirical results show improved accuracy (as measured by SEA and $\mathrm{V}_{\mathrm{rmse}}$ ) over Gaussian Mixture-based methods and classical Lyapunov-controlled approaches.
Scalable BMI-Constrained DS Synthesis (Agrawal et al., 5 Jul 2025): The compositional LPV-DS approach decomposes the high-dimensional, nonconvex BMI-constrained optimization into subsystem-level problems. Each subsystem is stabilized via a local Lyapunov function $V_i(x_i)$ , and the global Lyapunov function is constructed as $V(x) = \sum_i \mu_i V_i(x_i)$ subject to quadratic compositional constraints. This decomposition drastically increases scalability, enabling stable DS learning in the full 7-DoF joint space of a Franka Emika robot, where direct BMI optimization is otherwise numerically infeasible.

This direction robustly embeds physical or formal stability guarantees into the LfD pipeline, a requirement for real-world robotic deployments and safety-critical systems.

4. Implicit, Reservoir-Based, and Reinforcement-Modulated Dynamical Architectures

Implicit dynamical system modeling, inspired by reservoir computing and echo state properties, is increasingly relevant for tackling error accumulation and out-of-distribution generalization:

Echo State Layer (ESL) (Fagan et al., 27 Sep 2024) incorporates a fixed, randomly-initialized reservoir with echo-state dynamics and a learnable input embedding to endow neural networks with powerful temporal inductive bias and robustness against compounding errors. Empirical studies on the LASA handwriting dataset demonstrate lower Fréchet distances, improved smoothness, and robustness to noise—surpassing classic Echo State Networks and temporal ensembling strategies.
Context-modulated Reservoirs with RL (Koulaeizadeh et al., 17 Nov 2024): DARC (Dynamic Adaptive Reservoir Computing) learns a fixed reservoir-based policy from demonstrations, with low-dimensional context inputs encoding task goals. Online reinforcement learning modulates the context based on robot (and/or reservoir) state in order to generate novel, out-of-distribution behaviors (e.g., reaching, obstacle avoidance, path following) without additional reservoir training. Efficiency is achieved by restricting RL optimization to the context dimension. This paradigm decouples motor primitive acquisition from run-time adaptation, expanding the action repertoire without repeated demonstration data collection.

Such frameworks highlight the synergy between dynamical systems theory, temporal representation learning, and control, producing stable motor behaviors that remain robust to drift, perturbation, and distributional shift.

5. Lifelong, Personalized, and Federated Learning Paradigms

Scaling LfD to heterogeneous, sequential, or federated multi-task environments necessitates mechanisms for lifelong adaptation, strategy discovery, and knowledge transfer:

Dynamic Multi-Strategy Reward Distillation (DMSRD) (Jayanthi et al., 2022) segments heterogeneous demonstration data into a repository of strategy-specific policy–reward pairs. Upon receiving new demonstrations, DMSRD either explains them via optimized mixtures of existing strategies or trains a new policy and reward. The framework promotes adaptability, sample-efficiency, and scalability, demonstrated by a 77% average improvement in policy returns and a 42% increase in log likelihood on continuous control benchmarks.
Lifelong Inverse Reinforcement Learning (Mendez et al., 2022) factorizes each task’s reward $\theta^t$ as $L s^t$ , where $L$ is a latent shared basis and $s^t$ is a sparse, task-specific coefficient vector; together, this enables knowledge transfer, reverse transfer (retrofitting older tasks as $L$ evolves), and a reduction in demonstration burden for new skills.
Learning from Drift (LfD) in Federated Learning (Kim et al., 2023): Designed for non-IID data, “Learning from Drift” regularizes the client’s model by explicitly estimating and counteracting local–global logit drift during federated optimization, leading to strong performance on generalization, heterogeneity, scalability, and retention metrics.

This line of research ensures that LfD frameworks remain relevant in evolving, distributed, and user-adaptive robotic environments.

6. Optimization, Inverse Control, and Gradient-Free Learning from Demonstration

Recent work addresses inverse optimal control for LfD under practical constraints. ZORMS-LfD (Dry et al., 23 Jul 2025) introduces a zeroth-order random matrix search scheme for learning cost, dynamics, and constraint parameters directly from demonstrations:

At each step, a random symmetric matrix $M_k^U$ (GOE) perturbs the parameter $\hat{\theta}_k$ , and the “oracle” $\mathcal{O}_\mu(\hat{\theta}_k, M_k^U) = [\mathcal{L}(\hat{\theta}_k + \mu M_k^U) - \mathcal{L}(\hat{\theta}_k)] \cdot M_k^U / \mu$ drives parameter updates via projected descent—even when the learning loss is nonsmooth.
The method accommodates both continuous and discrete time, costs and constraints that are nonsmooth or nonlinear, and challenging optimal control problems (robot arm, rocket landing, quadrotor), yielding empirical loss comparable to state-of-the-art first-order methods while reducing wall clock time by over 80% in some benchmarks. For constrained scenarios where no first-order algorithm exists, ZORMS-LfD outperforms classic simplex-based methods.

This establishes LfD as viable, even in domains where analytic gradients are unavailable or numerical instabilities thwart first-order optimization.

7. Practical Implications and Future Directions

Across these developments, key practical themes emerge:

Uncertainty quantification is explicitly supported in Bayesian and nonparametric models, enhancing safety and decision-making under limited or noisy data.
Nonparametric clustering and context-aware adaptation improve generalization to unvisited states and novel environment configurations.
Stability, either via analytic Lyapunov functions, compositional DS frameworks, or energy-based neural architectures, is a sine qua non for safety-critical robotic deployments and reliable behavioral execution.
Modular, compositional, and federated approaches pave the way for scalable skill acquisition, skill retention over a lifespan, and transfer across task boundaries and user populations.
The integration of reinforcement learning as a modulation mechanism in dynamical architectures moves the system beyond rigid replay of demonstrations to true skill synthesis and generalization.

As LfD matures, automated decomposition strategies for compositional DS, integration with generative models for high-dimensional scene understanding, scalable nonparametric inference, and robust evaluation metrics for generalization and safety are anticipated to further elevate the impact of Learning from Dynamics in physical, virtual, and distributed agent settings.

Key References:

"A Bayesian Approach to Policy Recognition and State Representation Learning" (Šošić et al., 2016)
"Learning from Few Demonstrations with Frame-Weighted Motion Generation" (Sun et al., 2023)
"Scalable Learning of High-Dimensional Demonstrations with Composition of Linear Parameter Varying Dynamical Systems" (Agrawal et al., 5 Jul 2025)
"ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search" (Dry et al., 23 Jul 2025)
"Dynamic Multi-Strategy Reward Distillation" (Jayanthi et al., 2022)
"Lifelong Inverse Reinforcement Learning" (Mendez et al., 2022)
"Learning from Drift: Federated Learning on Non-IID Data via Drift Regularization" (Kim et al., 2023)
"Learning a Stable Dynamic System with a Lyapunov Energy Function for Demonstratives Using Neural Networks" (Zhang et al., 2023)
"Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments" (Zhang et al., 24 Apr 2024)
"Learning from Demonstration with Implicit Nonlinear Dynamics Models" (Fagan et al., 27 Sep 2024)
"Modulating Reservoir Dynamics via Reinforcement Learning for Efficient Robot Skill Synthesis" (Koulaeizadeh et al., 17 Nov 2024)