Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models

Published 19 Apr 2026 in eess.SY, cs.AI, and math.OC | (2604.17176v1)

Abstract: Future spacecraft operations require autonomy that can interpret high-level mission intent while preserving safety. However, existing trajectory optimization still relies heavily on expert-crafted formulations and does not support intent-conditioned decision-making. This paper proposes an intent-aligned spacecraft guidance framework that links high-level reasoning and safe trajectory optimization through explicit intermediate abstractions, based on behavior sequences and waypoint constraints. A foundation model first predicts an intent-aligned behavior plan, a waypoint generation model then converts it into waypoint constraints, and the safe trajectory is computed via optimization. This decomposition enables scalable supervision without sacrificing safety. Numerical experiments in close-proximity operation scenarios demonstrate that the proposed pipeline achieves over 90\% SCP convergence and yields a $1.5\times$ higher rate of generating trajectories that satisfy the top intent-prioritized performance criteria than heuristic decision-making. These results support the use of intermediate behavior abstraction as a practical interface between foundation-model reasoning and safety-critical onboard spacecraft autonomy.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents a hierarchical framework that maps high-level intent to safe trajectories using LLM-based reasoning and SCP.
It combines behavior abstractions, waypoint generation, and nonconvex trajectory optimization to ensure operational safety and efficiency.
It demonstrates superior SCP convergence rates and semantic alignment compared to heuristic approaches across 50,000 test scenarios.

Intent-Aligned Autonomous Spacecraft Guidance via Reasoning Models

Introduction and Context

Autonomous spacecraft operations in proximity regimes necessitate robust approaches to mapping high-level intent to safe, dynamically feasible trajectories. Existing optimal control frameworks, specifically Sequential Convex Programming (SCP), offer convergence guarantees and efficiency but remain reliant on expert-coded abstractions that lack semantic interpretability. Moreover, direct mappings from high-level objectives to full-state trajectories circumvent operator oversight and do not seamlessly support intent-conditioned, interpretable planning. Recent advances in foundation models (FMs), notably LLMs, have demonstrated general reasoning and policy generation capabilities, motivating their application as intermediaries for high-level planning in robotics and space systems. This paper introduces a hierarchical framework that leverages FMs for reasoning over behavior abstractions, structured waypoint generation, and nonconvex trajectory optimization, facilitating scalable, safe, and interpretable intent-driven guidance for spacecraft in close-proximity operations (2604.17176).

Figure 1: Overview of the training and deployment pipeline, integrating reasoning via FMs, waypoint constraint generation, and SCP-based safe trajectory optimization.

Hierarchical Intent-to-Trajectory Architecture

The framework decomposes the guidance pipeline into three sequential modules:

High-level Reasoning Model: Maps mission context and prioritized semantic intent to an explicit behavior sequence and transfer time. This is operationalized as tokenized outputs from a pretrained and LoRA-adapted LLM (Qwen2.5-7B-Instruct), trained with optimization-backpropagated supervision to maximize alignment of generated behaviors with intent, leveraging concise reasoning traces.
Waypoint Generation Model: A conditional Gaussian MLP model, predicting waypoint placement policies from the behavior sequence and scenario context, trained via reward-weighted imitation learning using SCP outcome metrics.
Trajectory Optimization (SCP): Given generated waypoints, solves a fuel-optimal, chance-constrained, nonconvex trajectory optimization problem, enforcing safety requirements and dynamic feasibility, including robust separation from the keep-out zone (KOZ) under actuation and navigation uncertainties.

This decomposition—explicitly separating behavior, waypoint, and trajectory layers—enables both modular training and operator intervention points, while maintaining end-to-end consistency from human intent to executed trajectories.

Data Generation and Learning Process

The supervised training pipeline iteratively bootstraps the waypoint generator and reasoning model. Large-scale scenario sampling populates a dataset by solving for optimal trajectories under diverse initializations and mission types (circumnavigation, flyby, ducking). Performance rewards are defined as a weighted aggregate of fuel, temporal, observation, and safety metrics, with neural waypoint models trained via reward-weighted log-likelihood maximization.

For the reasoning model, candidate mid-level plans are synthesized, and an LLM automates the annotation of selected plans and rationales by lexicographically following performance priorities. The reasoning model is then fine-tuned to directly infer behavior sequences and concise rationales from context and intent, explicitly ignoring downstream reward or optimization metrics during deployment.

Numerical Results and Analysis

Trajectory Generation and Quality

The approach demonstrates tractable, interpretable inference of multi-phase behavior plans, with downstream pipeline steps reliably translating high-level intent to safe, dynamically feasible RTN-frame trajectories.

Figure 2: Representative input, intermediate behavior-plan inference, and the resulting optimized trajectory demonstrating intent alignment and robust safety.

Waypoint Generation Effectiveness

On a set of 50,000 scenarios, the neural waypoint generator achieves SCP convergence in 90.6–90.8% of cases, compared to 72.2% for geometric heuristics, with reward-weighted policies yielding higher mean—lower variance—trajectory rewards. Importantly, the majority of waypoint placements (98.8%) fall within a 10 m tolerance of geometrically admissible domains, validating generalization and structural fidelity.

Figure 3: Histogram of reward distribution $R(y; X)$ over test scenarios comparing weighted, unweighted neural, and heuristic waypoint generators.

End-to-End Pipeline Performance

The full intent-to-trajectory pipeline is validated across four architectural variants (combinations of neural and heuristic modules). The reasoning LLM is feasible in 100% of cases; SCP success is mostly modulated by the waypoint generator, again favoring the neural model.

Critically, in configurations with both trained modules, the pipeline exhibits the highest dual-metric intent/trajectory match rates (18.0% for intent-trajectory, 20.4% for reasoning-trajectory), a relative improvement of approximately $1.5\times$ over heuristic reasoning at fixed waypoint generation. Behavior-sequence feasibility and semantic trace alignment are robust: 52.4% of traces match both top-priority intent metrics, 47.6% at least one.

Theoretical and Practical Implications

The results indicate that the explicit use of intermediate behavior and waypoint abstractions allows for scalable supervision, interpretable operator interventions, and effective propagation of high-level intent to safe, performant closed-loop policies. The LLM-based reasoning layer, decoupled from full state/trajectory metrics at inference, demonstrates that it is possible to distill robust planning heuristics from structured, optimization-backed data while preserving semantic flexibility and operator oversight.

Practically, these attributes facilitate progressive autonomy, critical for mission phases requiring human-in-the-loop supervision as well as full spacecraft self-management. The interface design admits future extensions to collaborative and multi-agent guidance scenarios, as well as integration with visual-semantic or goal-oriented natural language operator interfaces. The approach also highlights the benefit of modular system architectures where interpretable checkpoints propagate uncertainty and enable runtime intervention [art_ieeeaero24, jain2026autonomous].

Future Directions

Several directions are opened by this work: integration of uncertainty-aware neural warm-starts or learned SCP initialization [banerjee2025deep, yuan2024filtering], more expressive sequence models for behavior abstraction [celestini2025generalizable], compositional or multimodal trajectory synthesis, and application to general out-of-distribution context or multi-agent space logistics. Generalization of reasoning strategies and their formal verification in new mission domains remain open research areas.

Conclusion

This paper presents a hierarchical, intent-aligned spacecraft autonomy architecture, establishing that explicit decomposition via behavior and waypoint abstractions allows FMs to serve as high-level reasoning layers, harmonizing semantic mission objectives with safety-critical trajectory optimization. The proposed framework achieves superior SCP convergence rates and higher semantic alignment to operator intent than heuristic baselines, validating the approach in large-scale statistical studies. This method points to scalable, interpretable, and safe autonomy for next-generation space logistics and multi-agent orbital operations.

Markdown Report Issue