Symplectic Inductive Bias for Data-Driven Target Reachability in Hamiltonian Systems

Published 19 Apr 2026 in math.OC, eess.SY, and stat.ML | (2604.17213v1)

Abstract: Inductive bias refers to restrictions on the hypothesis class that enable a learning method to generalize effectively from limited data. A canonical example in control is linearity, which underpins low sample-complexity guarantees for stabilization and optimal control. For general nonlinear dynamics, by contrast, guarantees often rely on smoothness assumptions (e.g., Lipschitz continuity) which, when combined with covering arguments, can lead to data requirements that grow exponentially with the ambient dimension. In this paper we argue that data-efficient nonlinear control demands exploiting inductive bias embedded in nature itself, namely, structure imposed by physical laws. Focusing on Hamiltonian systems, we leverage symplectic geometry and intrinsic recurrence on energy level sets to solve target reachability problems. Our approach combines the recurrence property with a recently proposed class of policies, called chain policies, which composes locally certified trajectory segments extracted from demonstrations to achieve target reachability. We provide sufficient conditions for reachability under this construction and show that the resulting data requirements depend on explicit geometric and recurrence properties of the Hamiltonian rather than the state dimension.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents a novel target reachability framework that leverages symplectic geometry to achieve sample-efficient control in Hamiltonian systems.
The paper employs a Nonparametric Chain Policy to stitch locally validated control snippets from expert demonstrations, ensuring guaranteed energy descent and ergodic convergence.
The paper provides theoretical sample complexity bounds and empirical validations on spring-mass and pendulum systems, demonstrating robust performance with minimal demonstrations.

Symplectic Inductive Bias for Data-Driven Target Reachability in Hamiltonian Systems

Introduction

This paper addresses the challenge of deriving data-efficient control strategies for nonlinear systems, specifically Hamiltonian dynamics, by embedding inductive bias derived from the underlying physical principles. Leveraging the symplectic geometry and inherent recurrence on energy level sets characteristic of Hamiltonian systems, the authors present a target reachability framework that composes locally validated trajectory segments from expert demonstrations. The central result is that, through these methods, sample complexity is determined not by the full state dimension but by explicit geometric and dynamical properties of the Hamiltonian flow.

Problem Setting and Theoretical Foundations

The control objective is formalized as target reachability: steer a Hamiltonian system from an admissible initial set to a target set. Hamiltonian systems are modeled as:

$\dot{x} = J(x) \nabla H(x) + G(x)u$

where the state $x \in X$ , Hamiltonian $H$ denotes total energy, $J(x)$ is a skew-symmetric matrix representing the symplectic structure, and $u$ is the control input. Key assumptions include differentiability, gradient boundedness, and Lipschitz continuity for $f(x, u)$ .

The critical insight is that, under zero-input, trajectories are confined to invariant energy layers $\Sigma_E$ , and, within these layers, the dynamics are described by ergodic invariant measures. Through ergodic decomposition, the Hamiltonian flow partitions the space into ergodic components—within which typical orbits are dense.

Chain Policies and Assignment Set Synthesis

The policy structure central to this framework is the Nonparametric Chain Policy (NCP). It leverages an assignment set (a library of locally validated control snippets, each certified on a ball in state space) and a default control, usually zero input. The policy executes control snippets when within a corresponding ball; outside, zero input is applied, exploiting the recurrence properties of the natural dynamics.

Figure 1: Assignment set construction, illustrating the extraction of state-centered balls with verified trajectory segments from expert demonstrations.

Assignment set synthesis proceeds by extracting these balls and associated controls from demonstrations (see Algorithm 1 in the paper). At each step, the largest segment enabling guaranteed energy descent within a verifiable region is selected, and the trajectory continues from the boundary until the target is reached.

Energy-Based Reachability Analysis

The reachability method is fundamentally energy-based. The energy distance $\Delta H(x)$ to the target set is used to encode progress. Theoretical guarantees are established by showing the following:

Within each certified ball, applying the corresponding control snippet ensures uniform decrease in $\Delta H$ .
When the system leaves the support of the assignment set, the zero-input (Hamiltonian) dynamics—due to recurrence—eventually return the state to the support region for almost every initial condition.
Upon entering the target energy band, ergodic coverage guarantees arrival at the target set in finite time for almost every trajectory.

The main theorem formally links reachability to explicit properties: coverage of energy intervals and ergodic components—not full state space—suffices. The support requirement is thus reduced to an effectively lower-dimensional structure aligned with the Hamiltonian's geometry.

Existence and Sample Complexity Guarantees

Theoretical results derive explicit, finite sample complexity bounds based on covering the relevant energy intervals, leveraging properties such as Lipschitz continuity, strong convexity of the Hamiltonian, and ergodicity of energy layers.

Sample Complexity Bound

If $[H_1, H_2]$ is the relevant energy interval, the required number of control snippets is upper bounded by:

$x \in X$ 0

where $x \in X$ 1 and $x \in X$ 2 are, respectively, the Lipschitz and strong convexity constants, and $x \in X$ 3 is the energy descent rate.

This result demonstrates that data requirements now depend on the geometry and recurrence in energy space, not the ambient dimension, contradicting the typical exponential-in-dimension sample complexity seen in generic nonlinear control.

Numerical Validation

Spring-Mass System

The NCP achieves perfect target reachability with as few as one or two expert trajectories, whereas Behavior Cloning (BC) suffers from poor generalization in low-data regimes. The NCP's average reach time quickly saturates with additional demonstrations, highlighting efficiency in policy construction.

Figure 2: Success rate comparison between Chain Policy and BC on the spring-mass system, underscoring rapid improvement with minimal demonstrations.

Single Pendulum

For the single pendulum, the NCP reaches 100% success with just three demonstrations. In contrast, vanilla BC remains suboptimal even with five demonstrations, emphasizing the critical advantage of policies respecting dynamical induction.

Figure 3: Success rate for the single pendulum system, showing sample-efficiency of the Chain Policy over imitation learning baselines.

The results confirm the theoretical findings: samples scaling with energy geometry, not state dimension, suffice for reliable global control.

Implications and Future Directions

This framework for leveraging symplectic inductive bias in data-driven control recontextualizes sample complexity in nonlinear control by aligning learning architectures with physical law. Practically, it enables highly sample-efficient synthesis of safe policies with finite data, crucial for robotics and physical systems where data are costly or hazardous to collect. Theoretically, it challenges prevailing assumptions about d-dimensional sample requirements for nonlinear systems if physical invariants are appropriately exploited.

Potential future directions include:

Extending to Hamiltonian systems with multiple ergodic components or partial observability.
Robust policy design in the presence of dissipation or structured modeling uncertainty.
Optimizing demonstration collection to maximize low-dimensional coverage and reduce compounding error in policy execution.

Conclusion

This work establishes that physical structure, rather than state-space smoothness or generic nonlinear parameterization, furnishes the correct inductive bias for efficient data-driven control learning in Hamiltonian systems. Embedding symplectic geometry and recurrence into the policy synthesis pipeline leads to tractable, provably efficient, and robust control designs operable with limited data. The implications extend broadly to sample complexity theory, control synthesis, and the practical deployment of learning-based feedback policies for real-world physical systems.

Markdown Report Issue