AltNet: Embedding Alternatives & RL

Updated 7 December 2025

AltNet is a framework that introduces per-request topology alternatives to network embedding, enabling lower rejection rates and cost efficiency.
It employs GREEDY and TANTO heuristics to select optimal embeddings through MILP formulations and LP relaxations, ensuring scalability and near-optimal performance.
In reinforcement learning, AltNet uses a twin actor–critic strategy to alternate network resets, effectively balancing plasticity and stability without performance collapse.

AltNet refers to the incorporation of alternatives into the core of network embedding and learning, manifesting in two prominent, yet fundamentally distinct, lines of research: (1) the use of alternatives in virtual network embedding (formalized as the Virtual Network Embedding with Alternatives Problem, VNEAP), and (2) the strategic use of alternating neural architectures to address plasticity-stability trade-offs in reinforcement learning. In both contexts, AltNet denotes a departure from rigid, single-topology, or single-network paradigms—the former by allowing per-request topology alternatives to be exploited for embedding efficiency, and the latter by alternating full networks to circumvent plasticity loss, all while ensuring operational stability.

1. Theoretical Foundations of AltNet in Network Embedding

The VNEAP generalizes classical Virtual Network Embedding Problems (VNEP) by granting, for each network request, a set of topology alternatives: $T(a) = \left\{ G^a_t = (V^a_t, E^a_t) \mid t = 1, \dots, T_a \right\}$ where each alternative topology may admit different compute and bandwidth requirements, exposing new degrees of freedom for resource allocation. The formal optimization is encoded as a Mixed-Integer Linear Program (MILP), wherein binary variables $x^{r,t}_{i,v}$ and $x^{r,t}_{ij,vw}$ encode placement and routing for each request $r$ and alternative $t$ , subject to substrate capacity constraints. The system minimizes total embedding cost (compute, bandwidth, and rejections via penalty $\psi$ ): $\min_{x} \Bigg( \sum_{v \in V^S} cost_v \, \mathrm{load}_v + \sum_{(v,w)\in E^S} cost_{vw} \, \mathrm{load}_{vw} + \sum_{r\in R} \Big(1 - \sum_{t\in T(a(r))}x^{r,t}_{\theta,v(r)}\Big)\psi d_r \Bigg)$ subject to selection of at most one alternative per request and root placement anchoring.

This alternative-based formulation directly addresses heterogeneity in substrate resources and stochastic load: by permitting applications with several functionally equivalent topologies (e.g., different VNF chains), the embedding layer can select according to current network bottlenecks (compute vs. bandwidth), yielding lower rejection rates and operational cost at high utilization (Kolosov et al., 14 May 2025).

2. AltNet Algorithms and Heuristics for Embedding with Alternatives

The introduction of alternatives in VNEAP motivated two distinct solution heuristics: GREEDY and TANTO.

GREEDY: For each incoming request, iterate over all alternatives, try embedding each using a one-alternative heuristic (MINV), and select the lowest-cost feasible embedding. Complexity is $O(|R|\cdot|T(a)|\cdot m \log n)$ for $|R|$ requests and alternatives $|T(a)|$ .
TANTO (Tree of Alternative Network Topologies): Aggregates requests sharing $(v(r),a(r))$ into a single meta-request and solves an LP relaxation whose variables correspond to the fractional use of substrate resources across all alternatives. Each request is then probabilistically mapped via randomized rounding, ensuring optimality gap bounds and near-LP rejection rates. The LP phase is polynomial in substrate and alternative set size; rounding is linear in $|R|$ .

Scalability is a distinguishing feature: TANTO supports batched embedding on the order of $10^6$ requests in under a minute, while GREEDY’s complexity grows with the product $|R|\cdot|T(a)|$ . The typical empirical pattern is TANTO $\ll$ GREEDY rejection rates and cost at high load (Kolosov et al., 14 May 2025).

Heuristic	Complexity	Empirical Performance
GREEDY	$O(\|R\|\cdot\|T(a)\|)$	Up to $2\times$ higher rejection/cost than TANTO
TANTO	$O(\|V^S\|+\|E^S\|)\cdot\text{alt. size}$ (LP); $O(\|R\|)$ (rounding)	Near-optimal to LP, handles $>10^6$ requests

3. AltNet in Reinforcement Learning: The Twin Network Alternation Strategy

AltNet, in the reinforcement learning context, was developed to address the plasticity–stability dilemma: single neural policies lose plasticity (i.e., representation and learning capacity) over time due to neural weight norm blow-up, neuron dormancy, and collapse in representational rank. Prior “standard resets”—periodically reinitializing network parameters—improve learning capacity but induce catastrophic performance collapse when a naïvely reinitialized agent is exposed to the environment.

AltNet achieves plasticity restoration without transient return degradation by maintaining two actor–critic networks. At each swap interval, only one (A_active) interacts with the environment while the other (A_passive) is offline, training purely from shared experience. When A_active is reset, it becomes passive and A_passive—now well-trained—takes over as the online actor. The schedule ensures that only a high-performing model is ever deployed, while periodic resets prevent either model from suffering accumulated plasticity loss (Maheshwari et al., 30 Nov 2025).

4. Formalism and Mathematical Specification

In VNEAP, detailed MILP constraints guarantee:

At most one alternative per request: $\sum_{t\in T(a(r))} x^{r,t}_{\theta,v(r)} \le 1$
Root anchoring: $x^{r,t}_{\theta,u}=0$ for $u \neq v(r)$
Flow conservation and capacity constraints for node/link mappings
Objective: minimize total embedding cost and rejection penalties

In RL, the twin network update schedule is as follows (for SAC):

Alternating twins $A_1$ , $A_2$ : Each has its own actor and critic, both update parameters $(\theta_i, \phi_i)$ via off-policy mini-batch gradients from the replay buffer $B$ :

$\theta_i \leftarrow \theta_i - \eta_\theta \nabla_{\theta_i} L_C^i(\theta_i), \quad \phi_i \leftarrow \phi_i - \eta_\phi \nabla_{\phi_i} L_A^i(\phi_i)$

Reset period (in gradient updates): $\text{ResetFreq}_{\text{grad}} = U/2$ for $U$ total updates.

Critic and actor losses are, for each twin $i$ : $L_C^i(\theta_i) = \mathbb{E}_{(s,a,r,s')\sim B} [(Q_{\theta_i}(s,a) - (r + \gamma \mathbb{E}_{a'\sim\pi_{\phi_i}}[Q_{\bar\theta_i}(s',a') - \alpha \log \pi_{\phi_i}(a'|s')]))^2]$

$L_A^i(\phi_i) = \mathbb{E}_{s\sim B,\,a\sim\pi_{\phi_i}} [\alpha \log \pi_{\phi_i}(a|s) - Q_{\theta_i}(s,a)]$

5. Empirical Outcomes and Comparative Analysis

In network embedding, the addition of alternatives (VNEAP) consistently lowers rejection rates and total embedding costs compared to classical VNEP. At $100\%$ substrate load, single-alternative LP baselines reject $9$– $22\%$ of demand, while VNEAP-LP achieves $3$– $5\%$ rejection. TANTO closely matches LP, while GREEDY can double rejection and cost under bottlenecked conditions. The presence of “functionally equivalent” topologies enables the embedding engine to favor compute-light or bandwidth-light alternatives as dictated by system state, providing up to $80\%$ lower rejection relative to the best single-topology embedding (Kolosov et al., 14 May 2025).

In RL, AltNet provides a robust solution to plasticity loss without incurring safety-hazardous drops in agent return. Across DeepMind Control Suite tasks (Cheetah-run, Hopper-hop, Quadruped-run, Walker-walk), AltNet yields higher normalized area under the learning curve (AUC): outperforming SAC ( $\approx 38\%$ ), Standard Resets ( $\approx 12\%$ ), and reset-ensemble RDE ( $\approx 6\%$ ), with no post-reset collapse. Sample efficiency is substantially improved; e.g., in fixed-budget rollout, returns exceed the best SAC variant by $52\times$ (at $100$k steps), $1.8\times$ (at $300$k), and $1.3\times$ (at $500$k). In on-policy settings, such as PPO for MuJoCo Ant, AltNet nearly doubles the peak return and prevents decline due to plasticity loss (Maheshwari et al., 30 Nov 2025).

6. Practical Implementation and Considerations

In VNEAP, the practical application involves:

Maintaining a manageable set of application topology alternatives, either by manual domain specification or, potentially, through application compiler automatic generation.
Deploying TANTO for offline or periodic online global optimization, exploiting its capacity for batch request aggregation.
Preserving the full flexibility of the embedding layer by making all alternatives available at selection time; even a small set yields large improvements.

For AltNet in RL:

Employ exactly two actor–critic networks; increasing beyond two or equivalently scaling parameter count shows no further gain.
Use a reset interval matched to half the total gradient budget (e.g., $200$k updates for $U=1$ M), and never allow a freshly reset network to act before sufficient off-policy training.
Buffer size is critical: $1\times 10^6$ transitions (never flushed), with even moderate truncation causing marked performance degradation.
Architecture: For SAC, two hidden layers of $1024$ ReLU units; for PPO, MLP $64$–$64$.
Update both networks in parallel from the shared buffer at every step.

Operational caveats include never exposing a reset network to the environment until trained, maintaining strict update parity across twins, and avoiding buffer flushes during resets (Maheshwari et al., 30 Nov 2025).

7. Impact, Extensions, and Open Research Questions

AltNet, through the introduction of alternatives, has notable implications for both network virtualization and continual/distributed RL agent design.

For operators, alternatives provide resilience against substrate heterogeneity, burst load, and unpredictable bottlenecks, yielding greater cost control, lower service rejection, and increased systemic efficiency.
Extensions proposed for VNEAP include latency-aware and energy-aware embedding, automated generation of alternatives, fast online reoptimization, and extension to general (non-tree) virtual graphs using tree decomposition.
In lifelong RL, AltNet points towards lightweight, robust wrappers applicable atop both off-policy and on-policy actor–critic algorithms, removing the need for complex resets or ensemble methods, and suggesting applications to broader classes of nonstationary learning problems.

Further research directions encompass robust online VNEAP, automated alternative synthesis, and cross-domain (multi-provider) embedding in virtualization. In RL, examining how alternative-based architectures interact with neural pretraining, representation learning, and constraints imposed by real-world safety requirements remains a rich domain for exploration.

References:

"AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning" (Maheshwari et al., 30 Nov 2025)
"The Power of Alternatives in Network Embedding" (Kolosov et al., 14 May 2025)

PDF Markdown Chat (Pro)

References (2)

The Power of Alternatives in Network Embedding (2025)

AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning (2025)

AltNet: Embedding Alternatives & RL

1. Theoretical Foundations of AltNet in Network Embedding

2. AltNet Algorithms and Heuristics for Embedding with Alternatives

3. AltNet in Reinforcement Learning: The Twin Network Alternation Strategy

4. Formalism and Mathematical Specification

5. Empirical Outcomes and Comparative Analysis

6. Practical Implementation and Considerations

7. Impact, Extensions, and Open Research Questions

Whiteboard

Follow Topic

Continue Learning

AltNet: Embedding Alternatives & RL

1. Theoretical Foundations of AltNet in Network Embedding

2. AltNet Algorithms and Heuristics for Embedding with Alternatives

3. AltNet in Reinforcement Learning: The Twin Network Alternation Strategy

4. Formalism and Mathematical Specification

5. Empirical Outcomes and Comparative Analysis

6. Practical Implementation and Considerations

7. Impact, Extensions, and Open Research Questions

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics