Architecture–algorithm co-design questions for reinforced MoE and hardware-aware objectives

Identify and solve key architecture–algorithm co-design challenges for reinforced mixture-of-experts and hardware-aware objectives, including: (i) designing robust multi-objective reward functions that avoid trivial solutions such as all-expert sparsity; (ii) achieving stable credit assignment when architectural actions change network topology; and (iii) amortizing architecture policy learning across prompts, tasks, and deployment scales.

Background

The authors argue for making architecture a first-class action space in RL to co-optimize efficiency and capability. They enumerate several unresolved issues that must be addressed to enable practical architecture–algorithm co-optimization, especially in the context of MoE routing, sparsity, and deployment constraints.

References

Key open questions include designing robust multi-objective reward functions that avoid trivial solutions (e.g., all-expert sparsity), achieving stable credit assignment when architectural actions modify network topology, and amortizing architecture policy learning across prompts, tasks, and deployment scales.

A Survey of Reinforcement Learning for Large Reasoning Models (2509.08827 - Zhang et al., 10 Sep 2025) in Section 7.7 RL for Architecture-Algorithm Co-Design