Prompt2Auto: Geometry-Invariant Robotic Automation
- Prompt2Auto is a one-shot learning framework for robotic automation that uses Gaussian Processes in a geometry-invariant feature space.
- It employs coordinate transformation and sliding-window feature augmentation to generalize across translation, rotation, and scale.
- The system uses recursive prediction and skill classification to extrapolate complex trajectories from a single human motion prompt.
Prompt2Auto is a geometry-invariant one-shot learning framework for human-guided robotic automation, enabling a robot to perform complex trajectories from a single user demonstration—termed a “motion prompt”—with high generalization across translation, rotation, and scale. Through the integration of coordinate transformation, Gaussian process regression, and skill classification, Prompt2Auto addresses the shortcomings of conventional learning-from-demonstration methods, which typically require large datasets and are sensitive to coordinate frame changes. The Prompt2Auto framework demonstrates its effectiveness through both rigorous simulation and real-world robotic experiments, facilitating robust automated control with minimal demonstration burden (Yang et al., 17 Sep 2025).
1. Geometry-Invariant One-Shot Gaussian Process Learning
Prompt2Auto’s central innovation is a one-shot adaptive learning mechanism based on Gaussian Processes (GPs) operating in a geometry-invariant feature space. Instead of requiring many demonstrations to learn a manipulator skill, Prompt2Auto captures the relative geometry of the user’s motion prompt and generalizes the demonstrated behavior to new task instances.
The core feature engineering step is a coordinate transformation that converts Cartesian positions into normalized polar coordinates relative to the initial state :
The angular components are embedded with cosine and sine before normalization, allowing the GP to model periodicity and to achieve invariance to translation and rotation. The radial component is normalized by the maximum observed radius, providing scale invariance. This yields a normalized polar feature vector: where , and are all rescaled into .
2. Invariant Dataset Construction and Augmentation
The construction of invariant datasets is critical for enabling the model to generalize from a single demonstration. Once all poses are transformed to the normalized polar embedding, Prompt2Auto further augments the dataset by incorporating incremental motion (velocity) information: The dataset is organized as tuples of feature windows using a sliding window of width : for each time , the feature vector is the concatenation of features from to .
This sliding window encoding provides temporal context, which is essential for learning the dynamic structure of multi-step trajectories and enables recursive multi-step prediction based on a single observed motion prompt.
3. Recursive Multi-Step Prediction
The GP is trained to perform next-step prediction on the normalized (geometry-invariant) feature space. Given a partial demonstration—i.e., a motion prompt—the model performs recursive prediction as follows: where denotes the mean prediction of the GP, recursively shifting the sliding window for each new predicted step. These predictions are subsequently converted back to Cartesian coordinates using the inverse of the initial polar transformation, combined with scaling and rotation determined by the anchor point mechanism (see below).
This recursive process enables long horizon motion extrapolation from short prompts, and allows for rapid execution of multi-step tasks without human intervention.
4. Robustness to Prompt Variations and Scaling
Prompt2Auto incorporates anchor point scaling to maintain robustness in the presence of varied user prompting. The system automatically determines a scaling factor , computed at anchor checkpoints, that best aligns the predicted radial progression with what the user’s prompt would suggest: where are observed radii and are reference radii at anchor points.
This scaling ensures that prompts with differences in length, orientation, or size—as might occur due to user imprecision—are properly mapped to the desired trajectory without loss of accuracy or stability, further enforcing geometry invariance.
5. Multi-Skill Autonomy Through Skill Classification
Prompt2Auto supports multi-skill autonomous control by maintaining a library of GP models, each corresponding to a distinct skill demonstrated through separate reference trajectories. During execution, when a new prompt is received, the framework computes a similarity score between the prompt and stored models: where is the predicted trajectory from the th GP skill model, and the skill with the lowest error is selected for automated task continuation.
This architecture enables the system to recognize and extrapolate different robotic skills from short prompts, supporting both rapid skill switching and multi-task autonomy.
6. Experimental Validation and Applications
Prompt2Auto was validated in both numerical simulations and real-world robotic tasks:
- Simulation: In a GUI environment, users drew a reference trajectory (e.g., a closed curve), then provided partial prompts subject to translation, rotation, or scaling. GeoGP (geometry-invariant GP) predictions consistently maintained low errors, outperforming baseline “Exact GP” (Cartesian frame) models, especially under geometric transformations.
- Physical Robots: Two scenarios were demonstrated:
- Passive Takeover (Teleoperation): Upon simulated operator disconnection, the system’s predictive model autonomously completed a trajectory (e.g., a “Treble Clef”) from a partial human prompt.
- Active Skill Classification: The robot classified a short movement as belonging to a particular skill in a library and executed the full learned motion.
These results establish that Prompt2Auto provides robust, low-burden, geometry-invariant robotic automation from single demonstrations, with generalization across task types and conditions.
7. Technical Implications and Future Directions
Prompt2Auto’s approach via coordinate normalization, incremental temporal feature augmentation, anchor-based scaling, and recursive GP prediction yields a framework that supports geometry-invariant learning from demonstration in minimally supervised settings. The skill-classification mechanism further enhances multi-task flexibility.
Potential extensions include:
- Applying the framework to additional robotic modalities and higher-dimensional manipulators.
- Incorporating richer contextual environmental or sensory inputs into the feature representation.
- Extending anchor-based scaling and skill similarity scoring to nonlinear dynamical systems or variable environmental parameters.
- Investigating integration with end-to-end reinforcement or imitation learning for further generalization capabilities.
Details, demonstration code, and documentation are available at the project website: https://prompt2auto.github.io
Prompt2Auto’s geometry-invariant one-shot GP learning establishes a new paradigm for efficient, transferable, and robust human-guided automation in robotic systems, substantially reducing demonstration requirements while ensuring robust performance across coordinate transformations and skill types (Yang et al., 17 Sep 2025).