SE(2) Goal-Conditioned Trajectories

Updated 1 February 2026

Goal-conditioned SE(2) trajectories are time-parameterized paths in the planar rigid body space that combine translation and rotation to precisely achieve designated poses.
They leverage methodologies such as reinforcement learning, hierarchical sub-goal trees, and generative latent models to optimize both position and orientation errors.
These techniques are pivotal in robotics and autonomous navigation, supporting enhanced sample efficiency, robust planning, and real-world performance improvements.

A goal-conditioned SE(2) trajectory is a time-parameterized path in the configuration space of planar rigid body transformations (SE(2): translation in ℝ² plus rotation θ), where the trajectory is explicitly specified or optimized with respect to a target end pose or sequence of poses. Such trajectories are foundational in robotics, autonomous navigation, and legged locomotion, forming the basis for both geometric planners and learning-based controllers tasked with efficiently reaching specified positions and orientations in the plane.

1. Mathematical Formulation and Representations

Goal-conditioned SE(2) trajectory generation formalizes the problem as learning or planning a sequence of poses $X = [(x_1, y_1, \theta_1), \ldots, (x_H, y_H, \theta_H)] \in \mathbb{R}^{H \times 3}$ that starts at an initial state $s_0$ and terminates at a designated goal $g = (x_g, y_g, \theta_g) \in \mathrm{SE}(2)$ . The problem is typically posed either as an MDP with goal conditioning, a supervised trajectory prediction task, or a generative modeling framework.

A standard goal-conditioned MDP is defined by:

State space $\mathcal{S}$ : including proprioceptive and global kinematic state variables.
Goal space $\mathcal{G} = \mathrm{SE}(2)$ : specifying target poses.
Action space $\mathcal{A}$ : e.g., joint commands or velocity inputs.
Transition and reward depend explicitly or implicitly on the goal $g$ (Dugar et al., 16 Aug 2025).

Alternative representations include sub-goal trees, where the trajectory is recursively partitioned around midpoints conditioned on start and goal, enabling O(log T) parallel inference (Jurgenson et al., 2019), as well as latent generative models (e.g., diffusion, VAEs) that sample from $p_\theta(X|C)$ , with $C$ explicitly encoding the goal and context (Guillen-Perez, 3 Sep 2025).

2. Core Methodologies for Goal-Conditioned SE(2) Trajectories

Several methodology classes for synthesizing SE(2) goal-conditioned trajectories have emerged:

A. Reinforcement Learning with Direct Goal Conditioning

Modern approaches such as GoTo optimize time, energy, and precision for SE(2) reaching by structuring the reward to couple translation and rotation errors. GoTo leverages a constellation-based reward:

$N$ landmark points are anchored to the robot's local frame and mapped to the goal pose, and the per-timestep reward is a negative exponential of their mean squared distance after transforming to the intended goal pose.
This induces simultaneous reduction of position and heading errors, yielding fluid, integrated motion patterns, as opposed to rigid, marching-style velocity-tracking gaits (Dugar et al., 16 Aug 2025).

B. Sub-Goal Trees and Hierarchical Planning

The sub-goal tree (SGT) framework recursively decomposes trajectory prediction into midpoint-prediction tasks conditioned on the endpoints, enabling parallel sampling and divide-and-conquer style dynamic programming. The key propagation rule is:

$V_k(s, s') = \min_{s_m} [ V_{k-1}(s, s_m) + V_{k-1}(s_m, s') ]$

with appropriate SE(2) geodesic midpoints and cost metrics (Jurgenson et al., 2019).

C. Generative Latent Models with Goal Conditioning

Recent goal-conditioned diffusion models, such as Efficient Virtuoso, encode an entire trajectory into a compact latent (via PCA) and apply learned denoising in this space. Goal input is fused via a transformer-based encoder:

Endpoint or sparse route goals are embedded and provided as context for the noise-prediction network during denoising.
Ablations show that a route (multi-step goal) input substantially outperforms a single endpoint in trajectory fidelity (e.g., minADE=0.2541 m with route vs. 0.4510 m with endpoint on Waymo) (Guillen-Perez, 3 Sep 2025).

D. Projective Quasimetric Planning

ProQ introduces learnable, asymmetric latent-space distance functions over SE(2), then performs coverage with sparse latent keypoints (landmarks) that serve as dynamic sub-goals for long-horizon planning. At inference, a graph search over latent landmarks and shortest-hop control generates robust multi-step plans (Kobanda et al., 23 Jun 2025).

E. Equivariant Trajectory Networks

SE(2)-equivariant GNNs (PEP) preserve the geometric structure under arbitrary roto-translation, supporting goal-conditioned planning via a route-attraction module that softly guides the trajectory along a high-level path while enforcing equivariance (Hagedorn et al., 2024).

3. Reward Functions, Cost Metrics, and Conditioning Schemes

Reward and cost design in SE(2) goal conditioning directly impacts trajectory naturalness, efficiency, and convergence:

Constellation-based reward: Incorporates both translation of the robot's xy-position and rotation (heading), penalizing weighted position and orientation errors in a unified geometric term: $d_\text{con} = \|c - c^*\|^2 + I_c (\theta - \theta^*)^2$ with reward $r_\text{con} = \exp(-w_c d_\text{con})$ (Dugar et al., 16 Aug 2025).
Sub-goal interpolation metrics: Employ manifold midpoints computed via SE(2) log/exp maps or by separately interpolating xy and using shortest-arc averaging for θ. Edge costs typically take the form $c(s_a, s_b) = w_t \|t_b-t_a\|_2 + w_\theta |\mathrm{wrap}(\theta_b-\theta_a)|$ (Jurgenson et al., 2019).
Latent geometry and quasimetric: ProQ designs a projective quasimetric $D(z_i, z_j)$ mixing max/mean distances in learned embedding space, produces a sparse set of latent keypoints, and uses Floyd–Warshall for planning via sub-goal graphs (Kobanda et al., 23 Jun 2025).
Transformer goal encoding: Efficient Virtuoso and PEP highlight the benefits of richer goal representations (multi-step vs. endpoint), confirming precise route information is essential for high-fidelity trajectory synthesis (Guillen-Perez, 3 Sep 2025, Hagedorn et al., 2024).

4. Architectures, Algorithms, and Sample Efficiency

Summary of key architectural design choices and their roles:

Method	Input Modality	Architecture	Notable Properties
GoTo	qₜ (joint), pᵣ, Δx, Δy, Δθ	2-layer LSTM, PD setpoints	No kinematic planner, direct RL
SGT	(s₀, s_T), interpolated midpoints	MDN, multi-level recursion	O(log T) concurrent inference
Virtuoso	(x,y)[,θ], context, multi-step or endpoint goal	MLP denoiser, Transformer encoder	Latent diffusion, 2-stage norm
ProQ	(x,y,sinθ,cosθ), learned latent sub-goals	MLP for φ, graph search	Compositional, OOD-constrained
PEP	Past (multi-agent), high-level route	EqMotion GNN + route attraction	SE(2)-equivariance, sample efficient

GoTo’s RL approach (on Agility Digit-V3) achieves superior time, energy, and step metrics compared to marching-style baselines; constellation-based reward ensures fluid turning and trajectory integration (Dugar et al., 16 Aug 2025). SGT’s divide-and-conquer yields exponential speedup in trajectory prediction and competitive error rates (Jurgenson et al., 2019). Efficient Virtuoso’s Transformer-based latent diffusion attains state-of-the-art minADE with explicit goal route conditioning (Guillen-Perez, 3 Sep 2025). ProQ demonstrates robust global planning and value-estimation stability on long-horizon navigation (Kobanda et al., 23 Jun 2025). SE(2)-equivariant GNNs match or exceed dataset SOTA in planning L2 and collision metrics with minimal sample requirements (Hagedorn et al., 2024).

5. Benchmarking, Evaluation Metrics, and Empirical Insights

Goal-conditioned SE(2) trajectory research employs performance metrics tailored to precision, efficiency, and control stability:

Final Position and Orientation Error: $||(x_f, y_f) - (x_g, y_g)||_2$ and $|\theta_f - \theta_g|$ .
Time-to-Target: Time until trajectory satisfies tight spatial and rotational accuracy.
Footstep Count / Energy per Meter: Key for legged/humanoid systems to ensure natural, efficient locomotion (Dugar et al., 16 Aug 2025).
minADE, minFDE, MissRate@2m: Minimum average/final displacement error over sampled trajectories; standard in autonomous driving (Guillen-Perez, 3 Sep 2025).
Sample Efficiency: Measured by test set coverage with small training sets (e.g., PEP’s use of only ≈607 nuScenes scenes; maintains equivariance) (Hagedorn et al., 2024).
Equivariance Consistency: Output trajectory invariance under input roto-translation, critical for physical plausibility (Hagedorn et al., 2024).

GoTo achieves a 0.63× energy, 0.56× time, and 0.67× step reduction vs. normalizing baseline in short-range tasks; in real-world trials, 90.23% success under domain randomization (Dugar et al., 16 Aug 2025). Efficient Virtuoso obtains minADE=0.2541 m with sparse route goal vs. 0.4510 m for endpoint, demonstrating the necessity of structurally rich goals for planning (Guillen-Perez, 3 Sep 2025). SGT reports a ∼15x inference speedup over sequential methods in 2D planning (Jurgenson et al., 2019).

6. Key Insights, Extensions, and Open Directions

Unified translation-rotation reward (e.g., constellation reward) enables coupled error reduction for seamless turn-and-go motion, critical for humanoids and any agent with body orientation (Dugar et al., 16 Aug 2025).
Hierarchical, divide-and-conquer decomposition (SGT, ProQ) unlocks compositionality, low-depth inference, and improved generalization: particularly valuable in long-horizon or multi-modal planning (Jurgenson et al., 2019, Kobanda et al., 23 Jun 2025).
Rich, route-based goal conditioning is generally necessary for high-fidelity SE(2) trajectory prediction—even powerful models (diffusion, GNNs) see significant performance drops when reduced to endpoint-only goals (Guillen-Perez, 3 Sep 2025, Hagedorn et al., 2024).
SE(2)-equivariance in the planning model architecture leads to output stability, robust generalization, and reduced sample complexity—enabling reliable planning under arbitrary scene frame selection (Hagedorn et al., 2024).
Future work directions include scaling these techniques to SE(3) (full 3D pose), integrating manipulation and end-effector objectives, and leveraging auxiliary geometric and social interaction priors in goal-conditioned settings (Dugar et al., 16 Aug 2025, Hagedorn et al., 2024).

7. Notable Applications and Comparative Summary

Goal-conditioned SE(2) trajectories provide the foundation for:

Short-range agile movements in legged robots and humanoids, directly synthesizing joint-level actions to reach arbitrary poses (Dugar et al., 16 Aug 2025).
Autonomous driving, with route-following and multi-agent interaction in dense urban scenes (Guillen-Perez, 3 Sep 2025, Hagedorn et al., 2024).
Sample-efficient planning across modalities and hardware, accommodating sim-to-real transfer (Dugar et al., 16 Aug 2025, Hagedorn et al., 2024).
Long-horizon navigation and global planning by decomposing complex goals into robust, reachable sub-goals (Jurgenson et al., 2019, Kobanda et al., 23 Jun 2025).

Direct pose-reaching and latent-geometry-based approaches are consistently superior to indirect velocity-tracking or rigid hierarchical planners, notably in settings with complex kinematics, multi-step interactions, or severe sample constraints.

Key references:

"No More Marching: Learning Humanoid Locomotion for Short-Range SE(2) Targets" (Dugar et al., 16 Aug 2025)
"Sub-Goal Trees -- a Framework for Goal-Directed Trajectory Prediction and Optimization" (Jurgenson et al., 2019)
"Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning" (Guillen-Perez, 3 Sep 2025)
"Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning" (Kobanda et al., 23 Jun 2025)
"Pioneering SE(2)-Equivariant Trajectory Planning for Automated Driving" (Hagedorn et al., 2024)