Factored Subgoal Diffusion in Planning
- Factored Subgoal Diffusion is a method that uses conditional diffusion models and factorization to generate structured intermediate subgoals for long-horizon reasoning.
- It decomposes the subgoal space by entities, variables, or planning steps, improving sample efficiency and mitigating combinatorial explosion.
- The approach integrates with continuous reinforcement learning and discrete planning tasks, showing significant improvements in success rates and subgoal consistency.
Factored subgoal diffusion refers to a class of methods that employ (often conditional) diffusion models to generate structured, intermediate subgoals for long-horizon reasoning, planning, or control. The "factored" aspect denotes decomposition of the subgoal space—across entities, variables, or planning steps—enabling improved sample efficiency and modular generalization when tasks are combinatorial or involve multiple independent objects. This approach has been operationalized both in continuous control domains for hierarchical reinforcement learning (RL) and in discrete structured prediction for reasoning and planning problems, exploiting entity or subgoal-token factorization to mitigate subgoal imbalance and combinatorial explosion.
1. Mathematical Formulation of Factored Subgoal Diffusion
Factored subgoal diffusion models deploy a forward noising process followed by a reverse, learned denoising chain, parameterized to condition on initial state and global goal .
Continuous setting (RL, multi-entity control)
Given a set of entities, state , and goal , subgoals are represented as , each . Factored diffusion assumes the joint subgoal distribution approximately factorizes:
The forward process is
with and noise schedule .
The reverse (denoising) process is modeled as
where is a noise prediction network (typically a set-based Transformer), applied independently for each entity channel (Haramati et al., 2 Feb 2026).
Discrete setting (reasoning, planning)
Let a structured output with discrete entries. The forward chain iteratively corrupts to using absorbing-mask or categorical noise. The diffusion objective can be factorized by grouping tokens according to "subgoal" types or difficulty:
Here is a time- and group-specific schedule, and the token indices for group (Ye et al., 2024).
2. Subgoal Representation, Factorization, and Network Design
In factored subgoal diffusion, subgoals are explicitly decomposed according to natural factors in the environment or problem space.
- Entity-centric factorization: Each entity (e.g., object in a scene) is represented by its own feature vector. Subgoal diffusion networks are designed to output denoised subgoal vectors for each entity independently, while leveraging inter-entity relations via set-based architectures such as Transformers. Each output token corresponds to an entity, and the set-based denoiser allows for dependency modeling while retaining per-entity factorization (Haramati et al., 2 Feb 2026). Crucially, denoisers copy from clean state/goal tokens, facilitating "sparse edits"—only those entities that require movement are altered.
- Subgoal group factorization: In discrete sequence diffusion, tokens can be grouped by planning difficulty or role (e.g., groups of variables in SAT or grid locations in Sudoku). Training objectives are extended to maintain per-group weighting schedules or dedicated decoder heads. This creates an explicit factorization which attacks subgoal imbalance and enables subgoal-specific denoising (Ye et al., 2024).
3. Training Pipelines and Diffusion Losses
Factored subgoal diffusers are trained to minimize a noise-prediction or variational bound loss:
- Continuous (RL entity factorization):
- Train a value-based goal-conditioned RL agent on offline data.
- Collect (state, goal, subgoal) triples by sliding over trajectories; for a given , subgoal is sampled as an intermediate future state.
- For each triple, apply a noise-corruption process and train a set-based denoiser to predict the added noise for each entity vector, using MSE (Haramati et al., 2 Feb 2026).
- No back-propagation from RL into the diffusion model; the two are fully decoupled.
- Discrete (token-level group factorization):
- For each sequence, at each training step corrupt the sequence to time .
- Compute tokenwise cross-entropy loss , apply groupwise and timestep-specific weights , and sum over (group, token, time).
- This directly prioritizes hard groups (e.g., those corresponding to difficult planning subgoals), facilitating sampling from the correct combinatorial modes at inference (Ye et al., 2024).
4. Test-Time Subgoal Selection and RL Integration
Inference interleaves subgoal proposal, filtering, and low-level rollout:
- Sample multiple candidate subgoals via reverse diffusion for each entity (continuous case) or for each group/token (discrete case).
- Filter subgoals based on a reachability criterion, typically a thresholded value function: ensures low-level policies are not asked to achieve unreachable subgoals.
- Select the subgoal maximizing , i.e., closest in value-space to the final goal (Haramati et al., 2 Feb 2026).
- If no candidate is found, fall back to the original goal.
- Execute the low-level goal-conditioned policy to reach the intermediate subgoal, then repeat.
In RL settings, this mechanism ensures modularity: the subgoal diffusion model and the RL agent are independently trained and connected only at inference. Any value-based goal-conditioned RL algorithm may be used without retraining the diffusion module.
5. Empirical Performance and Ablations
Factored subgoal diffusion demonstrates notable empirical advantages in combinatorial, multi-entity, or long-horizon domains:
- In goal-conditioned RL with image observations and up to 3 objects, factored subgoal diffusion models (EC-SGIQL) achieve a increase in success rate on the hardest PPP-Cube task compared to flat IQL (64.3% vs. 25.0%) (Haramati et al., 2 Feb 2026).
- Subgoal-sparsity analysis reveals that entity-centric diffusion models change only the necessary entities (1.36 of 3, on average), while conventional regressors move nearly all, leading to unnatural or hard-to-reach waypoints.
- In reasoning/planning tasks, such as Countdown and Sudoku, multi-granularity and group-difficulty reweighted discrete diffusion achieves 91.5%–100% accuracy, surpassing autoregressive baselines by large margins (Ye et al., 2024). Ablations reveal direct contributions from groupwise weighting and factorization to final test accuracy.
The table below summarizes reported performances from representative domains:
| Domain | Method | Success/Accuracy (%) |
|---|---|---|
| PPP-Cube (3-entity image) | EC-IQL (no subgoals) | 25.0 |
| EC-SGIQL (factored diffusion) | 64.3 | |
| Sudoku (9×9, language) | Autoregressive (LLaMA) | 32.9 |
| Diffusion (MDM, 6M params) | 100.0 | |
| Countdown (CD4) | AR (85M GPT-2) | 45.8 |
| Diffusion (MDM) | 91.5 |
6. Significance, Advantages, and Open Directions
Factored subgoal diffusion is effective for several documented reasons:
- Multi-modality and combinatorial diversity: Unlike deterministic regressors, diffusion can sample from multiple disjoint high-probability subgoal configurations arising in multi-entity or combinatorial problems.
- Entity and group factorization: Imposes a bias that only the necessary components of state/goal/subgoal are altered at a given step, promoting sparse and plausible edits, and mitigating destructive entanglement.
- Subgoal imbalance mitigation: In discrete planning, groupwise or token-level reweighting ensures scarce or hard-to-learn subgoals (e.g., subgoals requiring high planning distance) are robustly optimized, preventing collapse on complex problem portions.
- Modularity for integration: The subgoal generation process is fully decoupled from low-level policies or planners, which enhances compatibility with existing control or RL architectures.
A plausible implication is that as domain complexity and entity count scale, factored subgoal diffusion may be essential to achieve tractable generalization and efficient long-horizon planning without resorting to explicit curriculum or reward shaping.
Open directions include dynamic or learned factorization of subgoal groups, adaptive diffusion schedules per subgoal group, and architectures that combine both entity factorization and cross-group relational reasoning. Early experiments suggest that explicit factorized heads and group-specific schedules further sharpen subgoal disentanglement in highly combinatorial settings, especially in symbolic reasoning tasks (Ye et al., 2024).
7. Related Methodologies and Distinctions
Factored subgoal diffusion differs from non-factored, monolithic subgoal diffusion and autoregressive inference in several salient aspects:
- Non-factored (joint) diffusion models all subgoal dimensions or tokens jointly, potentially resulting in entangled outputs where all entities move or all tokens change, even if only a subset are necessary for progress.
- Deterministic or regression-based subgoal generators collapse over modes, yielding highly averaged, hard-to-achieve intermediate targets—contrasted with the multi-modal sampling of diffusion.
- Autoregressive models sequentially predict subgoal tokens/steps; in subgoal-imbalanced settings, they suffer exponentially with respect to hard-to-predict positions, while diffusion-based approaches solve these more evenly as a function of groupwise weighting and simultaneous denoising (Ye et al., 2024).
- Hierarchical RL without diffusion typically requires careful selection of subgoal granularity, manual segmentation, and often fails under severe combinatorial or sparse-reward regimes.
Factored subgoal diffusion integrates naturally with goal-conditioned RL, planning, and complex reasoning architectures, offering a scalable approach to combinatorial subgoal generation across tasks spanning from embodied control to structured decision problems.