Atomic Action Slicing Framework
- The paper introduces a framework that segments continuous VLA demonstrations into short, typed atomic action segments aligned with planner outputs.
- It employs a three-stage process—segmentation, validation, and policy fine-tuning—to ensure robust integration between symbolic planning and imitation learning.
- Empirical results on the GATE-VLAP dataset show enhanced success rates and refined segment quality, highlighting the method’s practical impact on policy learning.
Atomic Action Slicing (AAS) is a planner-aligned framework that decomposes long-horizon demonstrations in vision-language-action (VLA) domains into short, typed atomic action segments. These atomic options align precisely with planner-defined plans, facilitating both effective symbolic planning and improved policy learning. The method produces validated datasets of atomic action segments, each labeled by action type, temporal span, and confidence score, establishing a bridge between high-level planners and low-level control policies in generalist VLA agents (Tabakov et al., 12 Dec 2025).
1. Formalism and Atomic Decomposition
At the core of Atomic Action Slicing is the decomposition of a demonstration episode
where are image observations, denotes ground-truth states, are low-level motor commands, is the language instruction, and denotes a symbolic scene graph. The goal is to segment into contiguous atomic segments:
Here, each is a typed atomic action from a fixed schema, with preconditions, effects, temporal bounds , and a scalar confidence . Segmentation constraints ensure full coverage and contiguity:
- ,
- ,
- for planner’s output
Atomic segments, therefore, are symbolic “options” that are suitable for both planning algorithms and direct imitation learning at the policy level.
2. Taxonomy and Properties of Atomic Actions
The schema includes a concise set of typed atomic actions such as:
- open_drawer(drawer)
- close_drawer(drawer)
- grasp(object)
- release(object)
- place(object, receptacle)
- lift(object)
- lower(object)
- push(object)
- pull(object)
Each action is parameterized, e.g., (“bowl”, “drawer”) and is formally linked to planning primitives as:
- Preconditions (e.g., )
- Effects (e.g., )
- Terminal condition (e.g., “end-effector exits drawer mouth”)
- Typical duration range
- These align directly to STRIPS or HTN planner operators, providing symbolic compatibility.
A fixed action schema enables seamless integration with planners and helps constrain representation complexity for generalization.
3. Segmentation Pipeline and Policy Learning
Atomic Action Slicing employs a three-stage process. The core segmentation (Stage II) uses a schema-constrained large Vision-LLM (VLM)—Gemini 2.5 Flash or Pro—prompted with: instruction , scene , schema , planner anchors , and few-shot exemplars. The VLM outputs action boundary proposals while enforcing coverage and contiguity. No fine-tuning of the VLM is performed; the method leverages zero-shot, few-shot prompting.
Segment validation (Stage III) requires the following conditions:
- Correct segment count ()
- Planner label order and timing monotonicity
- Segment durations within prescribed bounds
The resulting atomic-labeled dataset is used to fine-tune the CLIP-RT+ policy via imitation learning:
where each atomic segment supplies short, dense sequences for improved policy training.
4. GATE-VLAP Dataset
Applying AAS to 825 LIBERO demonstrations yields the GATE-VLAP dataset, comprising 2,124 atomic segments (758 LIBERO-Goal, 1,366 LIBERO-Long). Each segment is annotated with its label, start and end frame, and confidence score. Compared to the original demonstrations, this yields approximately 2.6 times more “training instances.” The dataset’s validation protocol ensures high segment quality, calibrating confidence scores by aggregating VLM internal signals, segment duration slack, and agreement under keyframe jitter.
| SUBSET | Num. Segments |
|---|---|
| LIBERO-Goal | 758 |
| LIBERO-Long | 1,366 |
| Total | 2,124 |
This resource is publicly released to promote reproducibility and further research.
5. Algorithmic Workflow and Planner Integration
AAS is integrated with symbolic planners through a closed-loop system:
- Symbolic state tracking: Predicates extracted via RGB-D and object tracking provide .
- Planning step: PDDL/HTN planner selects next -action ().
- Execution: Policy (CLIP-RT+AA) is run until its terminal condition.
- Transition and verification: Symbolic state is updated post-execution.
- Repeat until task completion.
A simplified online control pseudocode:
1 2 3 4 5 6 |
for t = 1 ... T_total:
if current option o_k complete:
observe symbolic state s_sym
o_{k+1} ← Planner.solve(s_sym, goal)
a_t ← π_{o_k}(o_t, ℓ)
execute a_t, observe o_{t+1} |
6. Empirical Evaluation and Quantitative Results
AAS segmentation is benchmarked using 100 demonstration episodes:
| Metric | Flash (Gemini 2.5) | Pro (Gemini 2.5) | Δ (Pro–Flash) |
|---|---|---|---|
| Success Rate | 74.0% | 93.0% | +19 pp |
| Avg Segments | 3.41 | 3.46 | +0.05 |
| Mean Kendall’s W | 0.9105 | 0.9136 | +0.0031 |
Downstream task success increases after policy fine-tuning with atomic segments:
| Task Suite | Baseline CLIP-RT+ | Fine-tuned CLIP-RT+AA | Δ |
|---|---|---|---|
| LIBERO-Goal | 94.2% | 95.3% | +1.1pp |
| LIBERO-Long | 83.8% | 88.8% | +5.0pp |
Segmenters with stronger language-vision alignment (Gemini 2.5 Pro) yield substantially improved segmentation and policy performance.
7. Robustness, Limitations, and Future Directions
AAS demonstrates stability under ±2 frame keyframe jitter (IoU_idx ≳ 0.9). Segmentation success drops considerably with smaller VLMs (–19pp) or without planner anchoring/strong schema (degradations of 10–15pp in SeqAcc and EditSim). This indicates that planner guidance and segment validation are critical for robustness.
Key limitations include dependence on fully specified BDDL scenes, sensitivity to temporal misalignment, and evaluation restricted to simulator environments. Potential extensions:
- Automatic scene description extraction (e.g., via SLAM and perception)
- Self-supervised boundary refinement
- Joint training of segmenter and policy
- Real-robot experiments and generalization to diverse domains
AAS thus defines a reproducible and planner-compatible pathway for extracting and leveraging atomic actions in VLA agents, significantly advancing compositionality, robustness, and generalization in long-horizon manipulation tasks (Tabakov et al., 12 Dec 2025).