Routing the Lottery in Networks and Deep Learning
- Routing the Lottery (RTL) is a dual-framework that combines lottery-based resource allocation and adaptive subnetwork pruning to optimize network and model performance.
- It employs CPT-based lottery mechanisms and combinatorial optimization to match outcome rankings with network constraints, ensuring feasible and efficient allocations.
- In deep learning, RTL extends the Lottery Ticket Hypothesis by routing data to specialized sparse subnetworks, achieving higher accuracy with fewer parameters.
Routing the Lottery (RTL) encompasses two independent, foundational theories in network science and deep learning. In resource allocation, RTL refers to the design of optimal lottery-based mechanisms that maximize aggregate utility under cumulative prospect theory (CPT), exploiting users’ probabilistic preferences within network capacity constraints. In deep learning, Routing the Lottery defines a framework that generalizes the Lottery Ticket Hypothesis by constructing specialized sparse subnetworks ("adaptive tickets") tailored to distinct data subsets—thus aligning model capacity with data heterogeneity. Both perspectives instantiate “routing” as a principled mapping: in networks, from lottery outcomes to feasible resource allocations; in neural networks, from input subsets to specialized subnetworks via binary masks—each rigorously formulated and algorithmically constructed.
1. Fundamentals of Lottery-Based Network Resource Allocation
Lottery-based resource allocation addresses networks with multiple users and links , where each link has capacity , and each user has a fixed route . Canonical routing matrices encode which links each user utilizes. Deterministic feasible allocations satisfy , defining the allocation polytope .
Instead of deterministic 0, lottery-based mechanisms assign each user a prospect 1: a 2-outcome discrete distribution, each outcome occurring with probability 3. For outcome 4, the allocation 5 must be network-feasible. User preferences are modeled via CPT, with value function 6, probability weighting 7, sorted outcome profile 8, and incremental weights 9. The ex-ante CPT utility is 0.
The system’s optimization selects both the outcome assignments 1 and a permutation 2 over outcomes for each user, maximizing total CPT utility while ensuring per-outcome link feasibility: 3 The two-layer structure involves: (1) Permutation selection (which matches outcome ranks across users); (2) Convex allocation, given permutations (Phade et al., 2018).
2. Duality, Algorithmic Structure, and Complexity
For fixed 4, the optimization over 5 is convex; introducing Lagrange multipliers for link-outcome and monotonicity constraints yields the Lagrangian and KKT-based decomposition. User-side, the problem becomes maximizing
6
where 7 are “budgets,” 8 are aggregated marginal prices, and 9. The network-side solves an Eisenberg-Gale convex program in 0.
Permuting outcome ranks (“roulette allocation”) is combinatorial and introduces non-convexities. The overall system problem thus may exhibit a strict duality gap; only after relaxing permutations to doubly-stochastic matrices (introducing marginal probability allocations, Birkhoff polytope) does strong duality recover (Phade et al., 2018). The primal is NP-hard, via reduction from the SUBSET-PARTITION problem even for two outcomes.
3. Relaxed Problem and Canonical Lottery Structure
The relaxed system problem replaces permutations with expectations, leading to the “average capacity” constraint: 1 For each user, the optimal CPT utility as a function of average allocation, 2, is concave and strictly increasing. Under the standard inverse‐S property for 3, the structure of the optimal lottery profile is “jackpot plus base pay”: a small number of high prizes (jackpots), remainder at a base value, determined by the threshold 4 where 5 departs from linearity.
4. Routing the Lottery in Adaptive Subnetwork Pruning
The deep learning formulation of RTL generalizes the single-mask Lottery Ticket Hypothesis (LTH). A network 6, partitioned into 7 subsets 8 (by class, cluster, or environment), learns a specialized binary mask 9 for each: 0 The joint objective is
1
All masks act over a shared backbone 2, with data routed by a trivial function (e.g., class label) to the correct mask. Parameter count is dominated by the backbone, with mask storage negligible compared to 3 independent models.
RTL learning comprises (A) per-subset IMP-style pruning to extract adaptive tickets, and (B) joint retraining using balanced mini-batches to optimize all 4 over their respective 5, interleaving gradient updates to preserve specialization (Stefanski et al., 29 Jan 2026).
5. Subnetwork Collapse and Mask Similarity Score
Under aggressive pruning, specialization may vanish as all 6 converge (“collapse”) to nearly identical masks; accuracy declines sharply. To detect this, the mask similarity score (Jaccard index) between binary masks 7 is
8
The mean similarity 9 over other subnetworks, as a function of sparsity 0, diagnoses over-pruning: empirically, 1 increases sharply at the critical sparsity threshold where balanced accuracy collapses. This provides a label-free early stopping criterion (Stefanski et al., 29 Jan 2026).
6. Empirical Performance and Comparative Evaluation
RTL has been validated on CIFAR-10 (class-specific), CIFAR-100 (semantic clusters via CLIP+UMAP+HDBSCAN), ADE20K for INRs, and speech enhancement domains with three acoustic scenes. Baseline comparisons include:
- IMP (single model): universal mask.
- IMP (multiple models): 2 independent tickets.
At typical 25% sparsity (see original tables), RTL attains:
| Task | RTL Balanced Accuracy | Baseline (IMP-single) | Baseline (IMP-multi) | RTL #Params | IMP-single | IMP-multi |
|---|---|---|---|---|---|---|
| CIFAR-10 | 0.781 | 0.711 | 0.712 | 103K | 94K | 944K |
| CIFAR-100 | 0.765 | 0.722 | 0.712 | 108K | 94K | 944K |
| ADE20K INR (PSNR, dB) | 18.86 | 15.94 | – | 48K | 40.5K | – |
| Speech (SI-SNRi, dB) | 7.25 | 6.89 | 5.29 | 32K | 28K | 84.1K |
RTL consistently matches or outperforms baselines in balanced accuracy, recall, PSNR, and SI-SNRi, while using up to 10 times fewer parameters than the multi-model alternative. Ablations confirm performance gains are robust to 3 and degrade gracefully with noisier clustering (Stefanski et al., 29 Jan 2026).
7. Synthesis and Theoretical Perspective
Both resource allocation and neural network pruning formulations of Routing the Lottery instantiate a modular, context-aware mechanism for aligning resources (network capacity or neural weights) with inherent heterogeneity (users or data subsets). In the lottery-based network context, RTL realizes strict improvements in ex-ante CPT utility by constructing optimal randomized allocations, outperforming deterministic and uniform strategies—especially pronounced when user psychology overweights rare outcomes (Phade et al., 2018). In deep networks, RTL transforms pruning from a static compression method into a dynamic, specialization-driven approach, achieving high parameter efficiency and interpretability without additional routers or gating structures (Stefanski et al., 29 Jan 2026).
A plausible implication is that, across domains, “routing the lottery” surmounts the tradeoff between universal solutions and full replication by leveraging specialization on a shared substrate—a principle with broad implications for modular, scalable design in networked and learning systems.