Routing Replay in Neural Architectures

Updated 3 December 2025

Routing replay is a mechanism integrating explicit path selection with replay buffers to anchor learning dynamics and mitigate catastrophic forgetting in neural networks.
In continual learning, sparse routing via mixture-of-experts networks partitions tasks to reduce interference and preserve previously learned representations.
In robotic planning, replay-guided diffusion leverages expert state buffers to confine trajectories within feasible regions and enhance planning success.

Routing replay refers to the integration of explicit path selection (routing) with replay mechanisms in neural and agent-based learning architectures. Two conceptually distinct implementations are prominent in recent literature: (1) sparse routing in mixture-of-experts (MoE) networks for continual learning, enhanced with episodic replay buffers to mitigate catastrophic forgetting (Collier et al., 2020); (2) trajectory planning in imitation learning where the diffusion process is explicitly routed through a buffer of feasible, previously observed states to ensure task feasibility and mimic expert demonstrations (Wang et al., 2023). Both paradigms utilize routing as a means to control the flow of information, and replay to anchor learning dynamics, but differ fundamentally in architecture, application domain, and theoretical guarantees.

1. Sparse Routing Replay in Continual Learning

In continual learning, the major challenge is catastrophic forgetting, where training a neural network on sequentially presented tasks results in destructive interference and rapid loss of previously acquired capabilities. Routing replay addresses this by employing a sparsely-gated MoE network with an episodic replay buffer (Collier et al., 2020).

The architecture consists of $L$ layers, each with $E$ independent experts. A layer-specific learnable routing matrix $R_\ell$ of size $(T \times E)$ (where $T$ is the number of tasks so far observed) assigns routing scores. For each input $x$ from task $t$ , the router selects the top $k$ experts by the largest $R_\ell[t,e]$ . The softmax-normalized routing is given by:

$p_\ell(e|t) = \frac{\exp(R_\ell[t,e])}{\sum_{f\in A_\ell(t)} \exp(R_\ell[t,f])}$

where $A_\ell(t)$ is the set of top- $k$ experts.

Sparsity (with $k \ll E$ ) ensures that unrelated tasks are routed through disjoint subsets of weights, reducing gradient interference and thereby minimizing forgetting. Episodic replay is implemented via a small buffer ( $N_{\text{buffer}}=1000$ ), from which mini-batches of previous tasks are interleaved with current data during training. This structurally anchors the model’s weights, complementing the protective effect of sparse routing.

2. Co-training Procedures and Unused-Expert Utilization

In naïve training regimes, routers rapidly lock in to a limited set of experts, starving the remainder of learning opportunities. To counter this, a co-training mechanism is introduced (Collier et al., 2020). For each mini-batch from the current task, two gradient steps are executed:

Standard routed update: SGD is applied through the $k$ experts currently routed for the active task.
Co-training step: All experts not yet activated by any previous task are temporarily routed with uniform weights, and an additional SGD step (without updating the router itself) is performed.

This procedure is governed by a Boolean flag $Used[\ell,t,e]$ to track first activations. The co-training learning rate is typically scaled down by a factor $\gamma<1$ relative to the main learning rate to prevent destabilization. The co-training phase prevents the collapse of routing capacity and ensures that all experts are viable candidates as new tasks arrive.

3. Routing Replay for Feasible Planning via Diffusion

A distinct implementation of routing replay arises in the context of robotic planning and imitation learning. The "Cold Diffusion on the Replay Buffer" (CDRB) approach constrains the diffusion process such that all intermediate states are drawn from a replay buffer $\mathcal{B}$ comprising only feasible, previously observed (expert or agent-generated) states (Wang et al., 2023).

The forward diffusion process replaces each state $s_i(t-1)$ in a trajectory with $s_i(t)$ sampled uniformly from the set:

$\left\{ s^* \in \mathcal{B}\ |\ \|s^* - s_i(t-1)\| \leq \epsilon_t \right\}$

where $\epsilon_t$ is a step-wise increasing distance schedule. Start and goal states are pinned throughout. The reverse restoration network $R_\theta(\cdot,t)$ is trained to revert noisy trajectories back to clean, expert-like ones.

Because all intermediate states in diffusion are routed via the contents of $\mathcal{B}$ , the entire generation process is constrained to known-good, feasible regions. No explicit feasibility projection is required.

4. Theoretical Properties and Guarantees

Continual Learning

Sparse routing combined with replay anchors expert parameters, strictly localizing destructive gradients. This structural protection enables both positive transfer (for tasks sharing experts) and minimal backward transfer (for disjoint tasks) (Collier et al., 2020). No additional regularizers are required—the combination of sparsity and replay suffices.

Feasibility-Preserving Planning

CDRB’s routing property inspires strong informal guarantees:

Feasibility Preservation: If $\mathcal{B} \subseteq F$ (the feasible set), all forward-diffused and reverse-sampled trajectories remain feasible.
Convergence: As diffusion steps $T_{\text{diff}} \rightarrow \infty$ and the restoration network $R_\theta$ approaches a perfect denoiser, the method reconstructs the expert distribution exactly, conditioned on start and goal states (Wang et al., 2023).

5. Empirical Validation

Continual Learning Benchmarks

On the MNIST-Permutations and MNIST-Rotations continual learning benchmarks, sparse MoE with replay and co-training achieves the highest average test accuracy (ACC) and lowest negative backward transfer (BWT), outperforming dense baselines. Representative results (mean $\pm$ std over 15 runs):

Method	Perm BWT	Perm ACC	Rot BWT	Rot ACC
Shared-bottom + replay	–0.057	0.912	–0.057	0.920
MoE + replay	–0.040	0.918	–0.038	0.923
MoE + replay + co-training	–0.038	0.920	–0.034	0.929

MoE-based approaches exhibit flatter accuracy curves across tasks and lower per-task drop, confirming reduced catastrophic forgetting (Collier et al., 2020).

Robotic Planning with Replay-Routed Diffusion

CDRB demonstrates significantly increased planning success rates in environments requiring valid navigation around obstacles. On all "Obstacle" benchmarks, CDRB attains 90–95% success compared to 60–75% for other diffusion-based planners, and 50% for MLP-imitators. In environments without obstacles, CDRB and non-replay diffusers perform equally well (~98% success). Path length metrics further indicate that CDRB produces on average smoother, 5–10% shorter trajectories (Wang et al., 2023).

6. Interpretability, Strengths, and Limitations

Interpretability

In continual learning settings, routing matrices and their induced task-similarity heatmaps reveal band-diagonal structure, reflecting intuitive relationships between tasks (e.g., similar rotation angles sharing experts) (Collier et al., 2020). In planning, generated trajectories via CDRB hug replay-buffered paths and avoid infeasible regions, aligning with expert demonstrations.

Strengths

Continual Learning: Structural separation of expert parameters and episodic replay co-protects old knowledge and enables rapid adaptation to related tasks.
Planning: Replay buffer routing ensures feasibility by construction, removes need for feasibility projections or constraints, and improves demonstration fidelity.

Limitations

Routing networks in continual learning can exhibit slower adaptation at the onset of new tasks, likely due to stochastic router behaviors and delayed expert utilization. Potential remedies include dynamic adjustment of $k$ , task-adaptive routers, or hybridization with explicit weight regularizers. In CDRB, the approach's feasibility guarantee hinges on the replay buffer's coverage; insufficiently rich buffers may limit the diversity of solutions (Collier et al., 2020, Wang et al., 2023).

7. Relationship to Broader Research Directions

Routing replay unifies structural (routing) and memory-based (replay) mechanisms, addressing interference, capacity allocation, and feasibility in both continual learning and planning tasks. Its principles are broadly applicable in any architecture where constraint satisfaction and catastrophic forgetting co-occur. A plausible implication is the potential extension to more general domains, combining buffer-guided routing with adaptive, learned constraints and hybrid expert models.

References:

"Routing Networks with Co-training for Continual Learning" (Collier et al., 2020)
"Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States" (Wang et al., 2023)

PDF Markdown Chat (Pro)

References (2)

Routing Networks with Co-training for Continual Learning (2020)

Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Routing Replay.