Spatially Extended Q-update (SEQ)

Updated 27 January 2026

Spatially Extended Q-update (SEQ) is a reinforcement learning method that diffuses the Bellman target across spatial and angular dimensions to overcome sample inefficiency in robotic pushes.
It uses an anisotropic Gaussian kernel to propagate updates, leveraging spatial locality and directional redundancy to smooth Q-value maps and improve generalization.
Empirical results on the ClutteredRavens benchmark show an 8% increase in success rate and reduced episode lengths, demonstrating enhanced performance in dense manipulation tasks.

A Spatially Extended Q-update (SEQ) is a reinforcement learning procedure designed to address the challenge of sample inefficiency and poor local generalization in pixel-wise Q-learning of non-prehensile manipulation actions—specifically, robotic pushes—in densely cluttered scenes. SEQ transforms each individual push transition into a dense set of “soft” supervision signals by spatially and angularly propagating Bellman targets, thereby enabling more effective learning from limited environment interactions. SEQ was introduced in the context of the Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes (HCLM) framework (Wang et al., 2023).

1. Motivation and Problem Formulation

Standard deep Q-learning approaches for learned pushing behaviors operate in action spaces parameterized by discrete pixels $(x,y)$ on a heightmap and a finite set of push angles $\theta$ . In conventional practice, each push transition updates the Q-network at only a single $(x,y,\theta)$ triple. However, two forms of redundancy are inherent to cluttered manipulation:

Spatial locality: The effect of a push at $(x,y)$ often extends to nearby pixels; the environmental response is not strictly localized.
Directional redundancy: Successive discrete angles often produce nearly overlapping environmental shifts due to the smoothness of physical dynamics.

Restricting Q-value updates to isolated $(x,y,\theta)$ bins squanders valuable supervisory information and impedes generalizability. SEQ addresses this by spatially diffusing the Bellman target through an anisotropic Gaussian kernel over local pixel neighborhoods and by blending across adjacent angular bins, substantially amplifying the utility of individual samples.

2. Mathematical Structure

SEQ generalizes the Bellman update for a single push transition $(o_t, a_t, r_t, o_{t'})$ as follows:

Spatial Kernel Propagation: Let $k_x$ and $k_y$ denote the extent of the update region in $x$ (push direction) and $y$ (orthogonal), and $\sigma_x$ , $\sigma_y$ the corresponding Gaussian widths. For $x \in [-k_x,0]$ and $|y| \le k_y$ , the spatial filter is

$\mathtt{Filter}(x,y) = \frac{1}{2\pi \sigma_x \sigma_y} \exp\left(-\frac{x^2}{2\sigma_x^2}-\frac{y^2}{2\sigma_y^2}\right).$

Bellman Target Construction:
- The greedy next action $a^*_t = \arg\max_{x',y',\theta'} Q_u(o_{t'},x',y',\theta')$ yields the future Q-value.
- The update gate $\eta = \mathbf{1}\{r_t>0\}$ blocks negative-progression propagation.
- The base target:
$Y_t = r_t \mathbf{1}_{\rm region} + \gamma \eta Q_u(o_{t'}, a^*_t)$

The spatially diffused target:

$Y_t^{\rm spatial}(x,y) = Y_t \cdot \mathtt{Filter}(x-x_t, y-y_t)$

Angular Diffusion:
- The target is further extended across three adjacent angles $(\theta_{i-1},\theta_i,\theta_{i+1})$ , decayed by factor $\kappa\in(0,1)$ for off-center angles:
$\mathbf{Y}_t^u = \left[\kappa Y_t^{\rm spatial},\; Y_t^{\rm spatial},\; \kappa Y_t^{\rm spatial}\right]$
Temporal-Difference Loss:
- With $[\mathbf{Q}^{\rm prev},\mathbf{Q},\mathbf{Q}^{\rm next}]$ extracting network Q-values at the localized region for these three angles, the TD-error tensor is
$\boldsymbol{\delta}_t^{u} = [\mathbf{Q}^{\rm prev},\mathbf{Q},\mathbf{Q}^{\rm next}] - [\kappa Y_t^{\rm spatial}, Y_t^{\rm spatial}, \kappa Y_t^{\rm spatial}]$

The Huber loss is masked to push-only actions and averaged or summed over the affected region.

3. Integration in Hierarchical Policy Learning

Within the HCLM framework, SEQ interacts with the dual-branch Dual-Level Action Network (DLAN) as follows:

Behavioral Cloning Phase: The pick and place options are trained via behavioral cloning and frozen.
Hierarchical RL Phase:
- The high-level Q-head (over {push, pick+place}) is trained via DQN with a Two-Stage Update Scheme (TSUS) to mitigate non-stationarity.
- The push-option Q-map is trained with SEQ, amplifying each push transition into a small spatial-angular tensor of targets.
Experience Handling: Transitions are stored in a prioritized experience replay buffer (PER), with mini-batches powering both Q-head and push-head updates. Only transitions with $a_t^H=\text{push}$ receive SEQ updates.

This integration promotes policy smoothness, enables efficient credit assignment for pushes, and disambiguates similar actions in cluttered arrangements.

4. Algorithmic Workflow

The SEQ update for a minibatch proceeds as follows:

Sample a minibatch from PER.
Filter for transitions with $a_t^H$ = push.
For each transition:
- Extract push parameters $(x_t, y_t, \theta_i)$ and compute $\eta$ .
- Compute the scalar Bellman target and construct the spatial filter.
- Populate the three-angle target tensor, applying $\kappa$ for adjacent angles.
- Gather current Q-network values for all local region-angle indices.
- Compute the TD-error tensor and Huber loss over all spatial-angular entries.
Aggregate losses over the batch and perform a gradient step.

This routine is embedded in a standard $\varepsilon$ -greedy exploration schedule, with push and high-level options selected by their respective policies.

5. Empirical Evaluation and Results

Empirical assessment on the ClutteredRavens benchmark, comprising six long-horizon manipulation tasks, demonstrates the quantitative impact of SEQ. In the hardest “cluttered-stack-block-pyramid” task:

Method/Settings	Success Rate (SR)	Average Episode Length
Full HCLM (with SEQ & TSUS)	87%	10.95
HCLM w/o SEQ (other factors fixed)	79%	11.98

These results indicate that SEQ yields an 8% absolute gain in success and approximately one step reduction in episode length, reflecting enhanced reliability and goal-directedness. Qualitatively, policies without SEQ tend to “miss” by a pixel or select inferior angles, necessitating repeated (often ineffective) pushes. SEQ smooths Q-maps, ensuring robust action selection near pile edges and in ambiguous cases.

6. Implementation and Hyperparameters

The critical hyperparameters adopted in (Wang et al., 2023) include:

Number of discrete push angles: $k_{\rm push} = 12$
Spatial region: $k_x=4$ , $k_y=3$ pixels (≈2 cm spatial radius)
Gaussian widths: $\sigma_x=2.0$ , $\sigma_y=1.5$ pixels
Angular decay: $\kappa=0.7$
Discount factor: $\gamma=0.9$
Replay buffer size: $50\,000$ ; PER with $\alpha=0.6$ , $\beta$ annealed $0.4\to1.0$
$\varepsilon_H$ (high-level policy): $1.0 \to 0.1$ over $50$ epochs
$\varepsilon_U$ (push policy): $0.5 \to 0.1$ over $100$ epochs
TSUS threshold: $\tau = 30$ epochs
Optimizer: Adam, learning rate $1\times10^{-4}$ , batch size $16$
DLAN: frozen CLIP ResNet-50 on RGB, four convolutional layers on depth, two-stream U-Net style decoder, late RGB/depth fusion

These choices ensure each expensive robot trial spreads its credit efficiently and improves training stability.

7. Broader Implications and Extensibility

SEQ introduces a lightweight, general-purpose mechanism for exploiting local spatial and angular redundancy in robot manipulation tasks. It can be deployed in any pixel-wise Q-learning affordance model for non-prehensile “push”-like actions. Its primary effect is to distribute the credit of high-cost physical transitions over a broader action region, yielding smoother value estimates and more data-efficient policy formation in cluttered, high-contact settings (Wang et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spatially Extended Q-update (SEQ).

Spatially Extended Q-update (SEQ)

1. Motivation and Problem Formulation

2. Mathematical Structure

3. Integration in Hierarchical Policy Learning

4. Algorithmic Workflow

5. Empirical Evaluation and Results

6. Implementation and Hyperparameters

7. Broader Implications and Extensibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Spatially Extended Q-update (SEQ)

1. Motivation and Problem Formulation

2. Mathematical Structure

3. Integration in Hierarchical Policy Learning

4. Algorithmic Workflow

5. Empirical Evaluation and Results

6. Implementation and Hyperparameters

7. Broader Implications and Extensibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research