STITCH: Sliding-memory Trajectory Inference

Updated 12 May 2026

The paper introduces STITCH, a trajectory-centric method that decomposes long-horizon tasks into overlapping, contextually coherent segments.
It leverages a sliding-memory mechanism and task chunking heuristic to maintain semantic and dynamical consistency across segments, enhancing sample efficiency and task resolution.
Empirical results show significant gains in robotic planning and agentic LLM benchmarks, achieving higher success rates with reduced demonstration data.

STITCH (Sliding-memory Trajectory Inference and Task Chunking Heuristic) is a trajectory-centric algorithmic methodology designed for both long-horizon robotic planning and high-efficiency agentic LLM training. It systematically decomposes lengthy decision-making sequences into overlapping, contextually coherent segments, enabling effective learning and inference even when only limited or fragmented demonstration data is available. The core innovations in STITCH are a sliding-memory mechanism for information propagation and a task chunking heuristic that enforces semantic and dynamical consistency across segments. This approach has demonstrated significant empirical gains in both robotics planning and software engineering agent benchmarks, outperforming a range of prior art in sample efficiency and task resolution metrics (Luo et al., 7 Mar 2025, Team et al., 1 Apr 2026).

1. Problem Formulation and Motivations

The trajectory stitching challenge arises in domains where the available data comprises only short subtask trajectories while the generative goal is to synthesize feasible long-horizon plans. In formal terms, given a dynamical system with state space $S\subset\mathbb{R}^n$ , action space $A\subset\mathbb{R}^m$ , and transition dynamics $s_{t+1}\sim P(\cdot|s_t, a_t)$ , STITCH enables sampling from the conditional path distribution $p(x_{1:T}|q_s, q_g)$ , where $x_{1:T}$ denotes the trajectory from an initial state $q_s$ to a terminal state $q_g$ . In language modeling and agentic LLMs, the analogous problem is to extract effective, decision-critical subsequences from long, noisy agent trajectories composed of interleaved thoughts, actions, tool calls, and observations (Luo et al., 7 Mar 2025, Team et al., 1 Apr 2026).

The central insight motivating STITCH is that the relevant regularities and dependencies in long-horizon tasks can be learned by leveraging overlapping segments of short trajectories, so long as appropriate cross-segment information flow and alignment are maintained—either through bidirectionally conditioned diffusion models in robotic control, or via memory-aware windowed pruning in LLM fine-tuning.

2. Sliding-memory Trajectory Inference

STITCH introduces a sliding-memory mechanism to propagate contextual information across chunked segments of a trajectory. In the case of diffusion planning, the full trajectory $x_{1:T}$ is partitioned into $K$ overlapping chunks $C_i$ of length $A\subset\mathbb{R}^m$ 0 with overlap $A\subset\mathbb{R}^m$ 1 ( $A\subset\mathbb{R}^m$ 2 stride). The joint trajectory likelihood is modeled as a product of local conditionals: $A\subset\mathbb{R}^m$ 3 Each chunk $A\subset\mathbb{R}^m$ 4 is then denoised via a diffusion model that conditions on its immediate neighbors' noisy states, enabling bidirectional communication and alignment at each denoising time step. This “sliding-memory” architecture enables STITCH to generate globally coherent trajectories from local chunk-level distributions, mitigating discontinuities at segment boundaries (Luo et al., 7 Mar 2025).

In LLM-based agent training, sliding-memory inference selects a subset of "decision-critical" tokens via a selection function: $A\subset\mathbb{R}^m$ 5 with segments defined by window size $A\subset\mathbb{R}^m$ 6, overlap $A\subset\mathbb{R}^m$ 7, and safe-split predicates $A\subset\mathbb{R}^m$ 8 indicating task-meaningful boundaries (Team et al., 1 Apr 2026).

3. Task Chunking Heuristic

STITCH's task chunking heuristic ensures that trajectory segments carry both sufficient context for reliable inference and sufficient overlap for enforcing consistency. For diffusion-based planning, chunk length $A\subset\mathbb{R}^m$ 9 must be significantly greater than overlap $s_{t+1}\sim P(\cdot|s_t, a_t)$ 0 ( $s_{t+1}\sim P(\cdot|s_t, a_t)$ 1) to ensure informative local context, while $s_{t+1}\sim P(\cdot|s_t, a_t)$ 2 is selected (empirically, 10–20% of $s_{t+1}\sim P(\cdot|s_t, a_t)$ 3) to robustly align overlapping transitions. Training explicitly corrupts both the target chunk and its neighbors to the same diffusion noise level, and alignment is learned in overlapping regions as the model predicts the noise for each chunk (Luo et al., 7 Mar 2025).

In LLM domains, segmentation is guided by semantic cues, with “safe-split” predicates identifying natural task boundaries (such as post-observation or post-instruction). Segments are locally scored ( $s_{t+1}\sim P(\cdot|s_t, a_t)$ 4) by an LLM-as-judge, and only chunks exceeding a quality threshold $s_{t+1}\sim P(\cdot|s_t, a_t)$ 5 are retained for supervised fine-tuning (Team et al., 1 Apr 2026).

4. Training, Inference, and Implementation

Diffusion-based STITCH (Robotics/Planning)

Conditional Model: $s_{t+1}\sim P(\cdot|s_t, a_t)$ 6 parameterized as U-Net (for low-dimensional tasks) or DiT transformer (for high-dimensional states).
Training Loop:
- Sample a demonstration trajectory $s_{t+1}\sim P(\cdot|s_t, a_t)$ 7 of sufficient length.
- Select a chunk index $s_{t+1}\sim P(\cdot|s_t, a_t)$ 8 and noise level $s_{t+1}\sim P(\cdot|s_t, a_t)$ 9.
- Corrupt the target chunk and both neighbors with matched Gaussian noise.
- Minimize the mean squared error between the true and predicted noise:
$p(x_{1:T}|q_s, q_g)$ 0
Inference: Maintain $p(x_{1:T}|q_s, q_g)$ 1 chunks initialized with random noise. Reverse diffusion proceeds autoregressively, with each chunk conditioned on the denoised output of its neighbors. The final trajectory is recovered by exponentially blending overlapping regions (Luo et al., 7 Mar 2025).

Agentic/Coding LLMs (Token Filtering)

Macro-level pre-screening: Logistic regression on feature vectors, with a global threshold $p(x_{1:T}|q_s, q_g)$ 2, pre-filters low-quality trajectories.
Micro-level filtering: STITCH sliding-memory segmentation using $p(x_{1:T}|q_s, q_g)$ 3 (e.g., 8,192 tokens), overlap $p(x_{1:T}|q_s, q_g)$ 4 (e.g., 512 tokens), and “safe-split” transition logic.
Memory summary: A compressed state (∼512 tokens) is carried across segments.
Chunk selection: Chunks above threshold $p(x_{1:T}|q_s, q_g)$ 5 (e.g., 7–8 on a 10-point scale) are retained, yielding a drastically reduced and higher-signal fine-tuning corpus (Team et al., 1 Apr 2026).

Domain	Chunk Model	Segment Overlap	Quality Gate
Robotics/Planning	Diffusion model	10–20% of $p(x_{1:T}\|q_s, q_g)$ 6	MSE on denoised chunk
LLM/Agentic	Windowed filtering	$p(x_{1:T}\|q_s, q_g)$ 7	LLM-judge score threshold

5. Empirical Results and Comparative Analysis

STITCH has been evaluated across domains with the following notable outcomes:

Robotic Planning: On benchmarks such as PointMaze-Giant, AntMaze-Giant, and HumanoidMaze-Giant, STITCH achieved success rates substantially higher than monolithic diffusion, Goal-conditioned Behavioral Cloning, decision transformers with goal relabeling, and leading offline RL algorithms. For example, on PointMaze-Giant, STITCH achieved 68% success (versus GSC 29%, DD 0%) (Luo et al., 7 Mar 2025).
Agentic and Coding LLMs: On SWE-bench Verified, Qwen3-30B augmented with STITCH filtered trajectories achieved a 43.4% resolve rate (+63.2% relative to RFT). On Java Multi-SWE-bench, MiniMax-M2.5-STITCH scored 43.75% (+16.67% vs. base). On HarmonyOS requirements (ArkTS), GLM-4.7-STITCH achieved a 61.31% compilation pass rate (+43.34%)—all with training data reduced to less than 1,000 trajectories in some cases (Team et al., 1 Apr 2026).

These results indicate that STITCH not only improves outcome metrics but does so with notably reduced demonstration or fine-tuning data, confirming the "Less-Is-More" hypothesis in diverse high-complexity domains.

6. Interpretations, Advantages, and Failure Modes

STITCH's effectiveness is attributed to a confluence of factors:

Preservation of cross-chunk dependencies: By enforcing overlapping alignment and context-sharing, STITCH avoids the hallucinations and inconsistencies that arise from naïve segmentation.
Quality-centric filtration: The chunking heuristic combined with explicit scoring focuses learning on semantically and functionally coherent chunks, boosting the utility of each supervised token.
Single-model generalization: In planning, STITCH efficiently generalizes from short task segments to arbitrarily long horizons using a single diffusion model (horizon $p(x_{1:T}|q_s, q_g)$ 8), obviating the need for training on full long sequences.

Limitations include error accumulation across extended chunk chains and the need for explicit tuning of chunk size and overlap parameters. Infeasible transitions may arise, potentially mitigated through rejection sampling or MCMC over candidate chunk configurations. Dynamically adapting the number or length of chunks based on goal confidence is a proposed extension. For real-time control, STITCH could be integrated with receding-horizon strategies to improve robustness to disturbances (Luo et al., 7 Mar 2025, Team et al., 1 Apr 2026).

7. Extensions and Broader Relevance

STITCH’s bidirectionally conditioned, memory-aware chunking principles are broadly applicable beyond the specific domains evaluated. Further integration with model-based planning for high-dimensional robotics, adaptive chunking schedules, and pipeline automation for agentic data curation are plausible future directions. The approach has demonstrated its applicability in settings ranging from robotic navigation to agentic software engineering, supporting its generality and extensibility across complex sequential decision-making tasks (Luo et al., 7 Mar 2025, Team et al., 1 Apr 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Generative Trajectory Stitching through Diffusion Composition (2025)

Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to STITCH (Sliding-memory Trajectory Inference and Task Chunking Heuristic).