Energy-Optimal Ocean Trajectory Planning

Updated 30 January 2026

Energy-optimal ocean trajectory planning is a discipline that computes optimal marine paths by integrating vehicle dynamics, environmental forces, and control theory to minimize energy consumption.
Methodologies such as MDP, dynamic programming, and convex optimization balance energy use and mission time, achieving significant energy savings compared to traditional methods.
Real-time feedback and adaptive learning algorithms enable vehicles to leverage favorable currents and winds, ensuring robust performance in uncertain, dynamic ocean environments.

Energy-optimal ocean trajectory planning refers to the formulation and computation of vehicle paths that minimize actuation energy consumption while navigating dynamic, uncertain, and often adversarial ocean environments. This discipline underpins long-duration autonomous missions for underwater vehicles (AUVs), surface vessels (USVs), hybrid marine craft, and aerial vehicles in maritime sensing scenarios. Solutions integrate vehicle dynamics, spatiotemporal ocean models, environmental disturbances, operational constraints, and optimal control principles to generate executable real-time feedback plans that exploit ambient flows, minimize energy expenditure, and satisfy mission requirements.

1. Fundamental Modeling of Ocean Vehicle Trajectories

Ocean trajectory planning requires a coupled representation of the vehicle’s kinematics/dynamics and the ambient ocean environment, including external drift from currents, wind, and in some platforms, renewable sources (sail force, harvested energy). For example, the Tethys AUV employs a unicycle-plus-currents model: vehicle state $x = (q,\theta)$ evolves according to

$\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$

where $v$ is commanded thrust, $\omega$ is the turn rate, $(\alpha,\beta)$ is local horizontal current, and $l$ indexes discrete depth layers (Orioke et al., 2019).

Surface vehicles and hybrid platforms require richer models to capture wind forces, wave drag, fuel consumption, battery states, and propulsion hybridization. UAV–maritime systems consider fixed-wing aerodynamics, wind vectors, and communication energy for data collection (Zhang et al., 2021).

The ambient flow is typically supplied by predictive ocean models (e.g., ROMS outputs), ensemble-based reconstructions (DO field decompositions), or real-time sensor data, discretized on spatial grids and sampled at mission-relevant time scales.

2. Optimal Control Formulations and Energy Metrics

Trajectory optimization adopts diverse cost functionals to encode energy consumption. Direct mechanical work, fuel burn, or electrical power integrated over mission duration are standard metrics. For Tethys, the discrete action set $U$ maps to energy costs: $c(\text{drift})=0$ , $c(\text{glide})=2$ , $c(\text{forward})=4$ , $\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 0, with the policy seeking $\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 1 (Orioke et al., 2019).

Multi-objective formulations, such as

$\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 2

trade off normalized energy (fuel burn) and total mission time with scalar parameter $\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 3 (Kandel et al., 2020). This enables Pareto analysis for mission planners.

Convexification techniques, as in (Ritari et al., 2023), recast nonlinear dynamics, hydrodynamic, and energy converter constraints into convex forms, making the globally optimal solution tractable and certifiable under mild conditions.

Energy-aware planning also incorporates environmental harvesting (solar, wind, wave), explicit in MDP reward construction for surface vehicles (Chowdhury et al., 2021), enabling the minimization of net (spent minus harvested) energy.

3. Planning Algorithms: MDPs, Dynamic Programming, and Optimization

The planning paradigm depends on vehicle dynamics, environmental uncertainty, and feedback requirements. Markov Decision Process (MDP)-based planning discretizes state-action spaces (position, heading, depth, fuel) and ocean environments to construct transition matrices $\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 4 and solve Bellman optimality equations

$\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 5

iteratively computing value functions and optimal feedback policies (Orioke et al., 2019, Chowdhury et al., 2021).

Dynamic programming approaches handle multi-objective, constrained problems, including flexible refueling (discrete port stops), fuel/time Pareto trade-offs, and refueling waypoint selection. Discretization and piecewise-Euler integration enable tractable DP recurrences over 3D state–velocity–fuel grids (Kandel et al., 2020).

Continuous optimization via non-convex nonlinear programming (NLPs), convex QPs/SOCPs (via convexification or LQ-OCP), direct transcription (flatness-based (Lutz et al., 2021), collocation (Martinsen et al., 2020)), and receding-horizon Model Predictive Control (MPC) frameworks are deployed for high-fidelity models, multi-stage planners, and hybrid vessels.

For energy-optimal 3D AUV path-following under ocean currents, planning is split into setpoint computation (persistent excitation parameters for surge/heave/pitch/yaw), minimizing total propulsion energy, followed by two-stage decoupled MPCs for tracking (Yang et al., 2022).

Learning-based planners, including RL-based SAC or DQN, achieve flow-adaptive planning even with limited observability; agents learn policies that surf ambient vortical and coherent structures, leading to substantial energy savings (Gadhvi et al., 7 Dec 2025, Hasankhani et al., 2021).

4. Role of Ocean Currents, Wind, and Environmental Structure

Exploiting favorable currents and wind is central to actionable energy savings. Ocean-aware planners evaluate the vector field at each decision node, often choosing routes that are geometrically longer but energetically cheaper by drifting along streamlines, timing actuation to cross barriers (Lagrangian Coherent Structures, FTLE ridges), and avoiding adverse regions.

For UAVs in wind, cyclical trajectory schemes (multiplexing communication and propulsion energy) split the path into multiple laps, shaping orbits (circular, 8-shape) and search for orientation/patterns that harness wind for minimal energy (Zhang et al., 2020, Zhang et al., 2021). Adaptive slot-to-slot online refinement ensures feasibility and robustness against stochastic wind disturbances.

Pareto analysis reveals rapid fuel savings for minor concessions in trip time, guiding operational trade-offs and port density decisions for USVs (Kandel et al., 2020). Hybrid sailboats adjust headings just outside the no-go wind zone, leveraging sail force maps and timed engine assists to achieve up to 23% energy reduction (Zhang et al., 2018).

Stochastic and partially observable flows, simulated by reduced-order field decompositions and Bayesian GP inference, drive both robust feedback planning and fast computation on modern GPUs (Chowdhury et al., 2021, Akbari et al., 2023).

5. Feedback Policy Representation and Real-Time Execution

Most feedback plans are compiled into lookup-tables (discrete policies) or parametric policy networks for online control. For the Tethys AUV, the optimal policy $\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 6 is stored for all grid-discretized $\begin{aligned} \dot x &= v\cos\theta + \alpha(x,l,t), \ \dot y &= v\sin\theta + \beta(x,l,t), \ \dot l &= 0, \qquad \dot\theta = \omega, \end{aligned}$ 7 states; at runtime, the vehicle queries its sensed location, retrieves the prescribed action, and reindexes as necessary in the event of off-grid drift (Orioke et al., 2019).

EMPC and MPC frameworks re-solve small to medium-size optimization problems on each sensing cycle (0.2–2 s), incorporating latest obstacle and disturbance measurements, ensuring closed-loop stability, energy-efficiency, and safety margin enforcement (Liang et al., 2021, Yang et al., 2022).

Reinforcement-learning agents sample local velocity histories, reconstruct partial flow fields (via GPR and CNNs), and act based solely on local information. Empirical studies indicate 30–50% energy savings over graph-planned or naïve baselines with success rates exceeding 90% in dynamically rich scenarios (Gadhvi et al., 7 Dec 2025).

Online inexact gradient-descent methods, such as IGD (Nutalapati et al., 2020), adapt trajectory iterates using noisy, time-varying gradient feedback, achieving near-optimal energy cost with sublinear regret in the presence of unknown or adversarial currents.

6. Quantitative Outcomes and Comparative Assessments

Simulation and experimental validations consistently show that energy-optimal ocean trajectory planning can halve the actuation/fuel usage compared to naïve open-loop or shortest-path schemes. Specific results include:

Scenario	Planner	Path/Time	Energy Used	Savings vs. Baseline
Tethys (SCB, AUV)	Feedback MDP (Orioke et al., 2019)	48 steps	108 units	57% reduction
USV & Refueling	DP Pareto (Kandel et al., 2020)	~150 km	20–50% less fuel	vs. direct path
Sailboat (pool)	NLP/Graph (Zhang et al., 2018)	5 loops	234 J (ψ=40°)	23.4% vs. worst
Hybrid Ship	Convex OCP (Ritari et al., 2023)	1 hr	10–20% less fuel	Battery-only legs
RL flow-aware (ASV)	SAC (Gadhvi et al., 7 Dec 2025)	Multi-env	35–51% less E	vs. A*/naïve
3D AUV (LOS+MPC)	Two-stage (Yang et al., 2022)	Lawnmower	19% reduction	vs. traditional LOS
Surface Vessel (Polygon)	Hybrid OCP (Martinsen et al., 2020)	1.4 km	269 kJ	54% over min-time

These results indicate the centrality of ocean-current and wind exploitation, hybrid optimal feedback construction, and real-time adaptive planning.

7. Research Directions and Open Problems

Recent work points to the following research trajectories:

Data-driven planning under partial or noisy flow information (ensemble ocean models, real-time sensor fusion, GPR, Bayesian inference, tube-MPC for robustness) (Chowdhury et al., 2021, Akbari et al., 2023).
Multi-agent and swarm-level planning, exploiting coherent flow structures and distributed information-gain/cost trade-offs (Krishna et al., 2021, Gadhvi et al., 7 Dec 2025).
Integration of energy harvesting, fuel cell/battery hybridization, and emissions constraints into trajectory–energy planning (Ritari et al., 2023).
Theoretical advances in global optimality certification for non-convex domains via convexification and hybrid discrete-continuous graph architectures (Martinsen et al., 2020).
Cyclical, patterned, and online-adaptive algorithms for marathon-scale maritime data collection under stochastic wind and current regimes (Zhang et al., 2021).
Benchmarking and real-world deployment in time-varying, cluttered, and uncertain ocean settings: operational validation against conventional straight-line, time-optimal, and fixed-waypoint protocols (Kandel et al., 2020, Yang et al., 2022).

Energy-optimal ocean trajectory planning stands as a mature, yet rapidly evolving field unifying optimal control, data-driven modeling, computational optimization, and adaptive planning for marine autonomy.