Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 98 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 165 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4 29 tok/s Pro
2000 character limit reached

Prompt2Auto: Geometry-Invariant Robotic Automation

Updated 24 September 2025
  • Prompt2Auto is a one-shot learning framework for robotic automation that uses Gaussian Processes in a geometry-invariant feature space.
  • It employs coordinate transformation and sliding-window feature augmentation to generalize across translation, rotation, and scale.
  • The system uses recursive prediction and skill classification to extrapolate complex trajectories from a single human motion prompt.

Prompt2Auto is a geometry-invariant one-shot learning framework for human-guided robotic automation, enabling a robot to perform complex trajectories from a single user demonstration—termed a “motion prompt”—with high generalization across translation, rotation, and scale. Through the integration of coordinate transformation, Gaussian process regression, and skill classification, Prompt2Auto addresses the shortcomings of conventional learning-from-demonstration methods, which typically require large datasets and are sensitive to coordinate frame changes. The Prompt2Auto framework demonstrates its effectiveness through both rigorous simulation and real-world robotic experiments, facilitating robust automated control with minimal demonstration burden (Yang et al., 17 Sep 2025).

1. Geometry-Invariant One-Shot Gaussian Process Learning

Prompt2Auto’s central innovation is a one-shot adaptive learning mechanism based on Gaussian Processes (GPs) operating in a geometry-invariant feature space. Instead of requiring many demonstrations to learn a manipulator skill, Prompt2Auto captures the relative geometry of the user’s motion prompt and generalizes the demonstrated behavior to new task instances.

The core feature engineering step is a coordinate transformation that converts Cartesian positions p(tk)\mathbf{p}(t_k) into normalized polar coordinates relative to the initial state p(t0)\mathbf{p}(t_0): r(tk)=[p1(tk)p1(t0)]2+[p2(tk)p2(t0)]2r(t_k) = \sqrt{\left[p_1(t_k)-p_1(t_0)\right]^2 + \left[p_2(t_k)-p_2(t_0)\right]^2}

θ(tk)=atan2[p2(tk)p2(t0),p1(tk)p1(t0)]\theta(t_k) = \operatorname{atan2}\left[p_2(t_k)-p_2(t_0),\,p_1(t_k)-p_1(t_0)\right]

The angular components are embedded with cosine and sine before normalization, allowing the GP to model periodicity and to achieve invariance to translation and rotation. The radial component r(tk)r(t_k) is normalized by the maximum observed radius, providing scale invariance. This yields a normalized polar feature vector: ξ~(tk)=[r˙(tk),sin˙θ(tk),cos˙θ(tk)]\tilde{\xi}(t_k) = [\dot{r}(t_k),\, \dot{\sin}\theta(t_k),\,\dot{\cos}\theta(t_k)] where r˙\dot{r}, sin˙θ\dot{\sin}\theta and cos˙θ\dot{\cos}\theta are all rescaled into [0,1][0, 1].

2. Invariant Dataset Construction and Augmentation

The construction of invariant datasets is critical for enabling the model to generalize from a single demonstration. Once all poses are transformed to the normalized polar embedding, Prompt2Auto further augments the dataset by incorporating incremental motion (velocity) information: ζ(tk)=ξ~(tk)ξ~(tk1)\zeta(t_k) = \tilde{\xi}(t_k) - \tilde{\xi}(t_{k-1}) The dataset is organized as tuples of feature windows using a sliding window of width ww: for each time tkt_k, the feature vector ζ(tk,w)\zeta(t_k, w) is the concatenation of features from tkw+1t_{k-w+1} to tkt_k.

This sliding window encoding provides temporal context, which is essential for learning the dynamic structure of multi-step trajectories and enables recursive multi-step prediction based on a single observed motion prompt.

3. Recursive Multi-Step Prediction

The GP is trained to perform next-step prediction on the normalized (geometry-invariant) feature space. Given a partial demonstration—i.e., a motion prompt—the model performs recursive prediction as follows: ξ^(tk+1)=ξ~(tk)+μ(ζ(tk,w))\hat{\xi}(t_{k+1}) = \tilde{\xi}(t_k) + \mu(\zeta(t_k, w)) where μ()\mu(\cdot) denotes the mean prediction of the GP, recursively shifting the sliding window for each new predicted step. These predictions are subsequently converted back to Cartesian coordinates using the inverse of the initial polar transformation, combined with scaling and rotation determined by the anchor point mechanism (see below).

This recursive process enables long horizon motion extrapolation from short prompts, and allows for rapid execution of multi-step tasks without human intervention.

4. Robustness to Prompt Variations and Scaling

Prompt2Auto incorporates anchor point scaling to maintain robustness in the presence of varied user prompting. The system automatically determines a scaling factor λ(tk)\lambda(t_k), computed at anchor checkpoints, that best aligns the predicted radial progression with what the user’s prompt would suggest: λ(tk)=1C(tk)i=1C(tk)r(ti)r^i\lambda(t_k) = \frac{1}{C(t_k)} \sum_{i=1}^{C(t_k)} \frac{r(t_i)}{\hat{r}_i} where r(ti)r(t_i) are observed radii and r^i\hat{r}_i are reference radii at anchor points.

This scaling ensures that prompts with differences in length, orientation, or size—as might occur due to user imprecision—are properly mapped to the desired trajectory without loss of accuracy or stability, further enforcing geometry invariance.

5. Multi-Skill Autonomy Through Skill Classification

Prompt2Auto supports multi-skill autonomous control by maintaining a library of GP models, each corresponding to a distinct skill demonstrated through separate reference trajectories. During execution, when a new prompt is received, the framework computes a similarity score between the prompt and stored models: en=p(tj)pn(tj)e_n = \sum \left\| p(t_j) - p^n(t_j) \right\| where pn(tj)p^n(t_j) is the predicted trajectory from the nnth GP skill model, and the skill with the lowest error is selected for automated task continuation.

This architecture enables the system to recognize and extrapolate different robotic skills from short prompts, supporting both rapid skill switching and multi-task autonomy.

6. Experimental Validation and Applications

Prompt2Auto was validated in both numerical simulations and real-world robotic tasks:

  • Simulation: In a GUI environment, users drew a reference trajectory (e.g., a closed curve), then provided partial prompts subject to translation, rotation, or scaling. GeoGP (geometry-invariant GP) predictions consistently maintained low errors, outperforming baseline “Exact GP” (Cartesian frame) models, especially under geometric transformations.
  • Physical Robots: Two scenarios were demonstrated:
    • Passive Takeover (Teleoperation): Upon simulated operator disconnection, the system’s predictive model autonomously completed a trajectory (e.g., a “Treble Clef”) from a partial human prompt.
    • Active Skill Classification: The robot classified a short movement as belonging to a particular skill in a library and executed the full learned motion.

These results establish that Prompt2Auto provides robust, low-burden, geometry-invariant robotic automation from single demonstrations, with generalization across task types and conditions.

7. Technical Implications and Future Directions

Prompt2Auto’s approach via coordinate normalization, incremental temporal feature augmentation, anchor-based scaling, and recursive GP prediction yields a framework that supports geometry-invariant learning from demonstration in minimally supervised settings. The skill-classification mechanism further enhances multi-task flexibility.

Potential extensions include:

  • Applying the framework to additional robotic modalities and higher-dimensional manipulators.
  • Incorporating richer contextual environmental or sensory inputs into the feature representation.
  • Extending anchor-based scaling and skill similarity scoring to nonlinear dynamical systems or variable environmental parameters.
  • Investigating integration with end-to-end reinforcement or imitation learning for further generalization capabilities.

Details, demonstration code, and documentation are available at the project website: https://prompt2auto.github.io


Prompt2Auto’s geometry-invariant one-shot GP learning establishes a new paradigm for efficient, transferable, and robust human-guided automation in robotic systems, substantially reducing demonstration requirements while ensuring robust performance across coordinate transformations and skill types (Yang et al., 17 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Prompt2Auto.