Parameter-Sharing & Curriculum Learning

Updated 3 October 2025

Parameter-sharing and curriculum learning are strategies that reuse network weights and sequence task complexity to enhance learning efficiency and transfer in control systems.
They reduce computational costs and memory use by leveraging shared features across multiple outputs, tasks, or agents in high-dimensional environments.
Combining these approaches fosters faster convergence, robust generalization, and improved coordination in applications like robotics, multi-agent systems, and offline RL.

Parameter-sharing policies and curriculum learning jointly address the challenges of sample efficiency, generalization, and transfer in high-dimensional control and reinforcement learning systems. Parameter sharing exploits structural commonalities across control dimensions, tasks, or agents, typically via neural network architectures that reuse weights or learned features. Curriculum learning organizes the training process as a sequence of increasingly complex tasks or subspaces, guiding the learner to master simpler behaviors before tackling coupled, high-interaction, or hard-to-explore aspects. When combined, these strategies support scalable policy learning in settings marked by high dimensionality, multi-agent coordination, changing environments, and diverse skill acquisition.

1. Foundations and Motivations

Parameter sharing is a network design or RL training principle in which a single set of network weights is reused for multiple outputs—across control parameters (multi-dimensional action spaces), tasks (multi-task learning), or agents (multi-agent RL). The resulting architectures reduce memory and computation, facilitate knowledge transfer, and enable the network to exploit correlations or shared structure among diverse control or task facets (Murali et al., 2017, Shao et al., 2018, Sun et al., 2022, Zhang et al., 2023).

Curriculum learning, by contrast, modulates environmental or task complexity, the sequence of training data, or the set of control dimensions explored at a given stage. Instead of presenting the hardest problem first, the agent is led through a tailored sequence—from simple to complex—either by hand-designed schedules or by automatic methods driven by sensitivity analysis, regret, transfer metrics, or policy progress (Murali et al., 2017, Narvekar et al., 2018, Foglino et al., 2019, Portelas et al., 2020).

The synergy of these two approaches becomes especially important in domains with high-dimensional or coupled control, non-stationary environments, sparse reward signals, or where the cost of failed exploration is high (e.g., robotics, distributed control, complex simulation).

Parameter-sharing in policy learning is instantiated through several neural and RL design strategies:

Layer and Feature Sharing in High-Dimensional Control: In CASSL (Murali et al., 2017), convolutional and early fully connected layers ("fc6") process raw visual input for all control parameters, while output heads (fc7 and later) are parameter-specific. This structure enables the shared layers to extract common visual features, while the branches learn dimension-specific mappings.
Shared Policy Networks in Multi-Agent RL: PS-MAGDS (Shao et al., 2018) and subsequent works employ a single policy network whose parameters are updated from the experiences of all agents, promoting cooperative policies without maintaining distinct networks per agent.
Multi-Head or Modular Architectures: A backbone network encodes observations (or features), and each head specializes for a particular task or control output (e.g., MoE in IMC (Blessing et al., 2023)), skill (Zentner et al., 2021), or agent (with tagging) (Terry et al., 2020).
Dynamic or Compositional Sharing: DynaShare (Rahimian et al., 2023) introduces hierarchical gating mechanisms—task-level gates select layers to activate, while instance-level gates, conditioned on input features, decide module utilization at inference time. PaCo (Sun et al., 2022) decomposes policy parameters into a basis matrix for sharing and task-specific coefficients for composition, supporting interpolation and flexible sharing. In one-shot NAS, CLOSENet (Zhou et al., 2022) applies a curriculum to the extent of parameter sharing, gradually reducing sharing (increasing specialization) through lattice-like assignment of parameter blocks.
Recurrent and Latent Structure Sharing: In parameterized recurrent CNNs (Savarese et al., 2019), convolutional layers are expressed as combinations of global templates; layers with matching coefficients effectively induce recurrences, and the induced hybrid CNN-RNN structures naturally adapt to increasing data complexity in curriculum settings.

3. Curriculum Design and Sequencing Mechanics

Curriculum design is a central challenge, as its efficacy hinges on correctly identifying what should be learned first and how complexity should be increased. Different methodologies have emerged:

Sensitivity-Driven Curricula: CASSL (Murali et al., 2017) applies variance-based global sensitivity analysis (Sobol indices) to the action/control space. Control dimensions are ordered so that those with the highest first-order sensitivity and lowest interaction terms are learned first, informed by solving

$\min_{\Psi} E(\Psi) = \sum_{i\in\Psi}(S_i^{(T)}-S_i^{(1)})+\sum_{i\in\Psi}\sum_{j\in(\Omega-\Psi)}|S_{ij}^{(2)}|$

for the curriculum order $\Psi$ .

CMDP (Curriculum MDP) Formulation: Curriculum sequencing is formalized as an MDP over the learner’s parameter vector $\bm{\theta}$ , with task selection as actions and transfer-accelerated policy evolution as transitions (Narvekar et al., 2018). The curriculum policy is itself trained via RL, e.g., Sarsa( $\lambda$ ), using empirical improvements as rewards.
Meta-Learning of Curricula: Meta-ACL (Portelas et al., 2020) generalizes curriculum construction by clustering learners in a "knowledge component" space, matching new learners to expert curricula learned from previous agents of similar competence. The AGAIN algorithm formalizes this sharing by mining curriculum parameters (e.g., ALP-GMMs) from past experience and composing them for new students.
Optimization Over Curriculum Objectives: Frameworks specify objective functions—jumpstart, regret, max-return, time-to-threshold—and seek task sequences maximizing these metrics (Foglino et al., 2019). Parameter sharing amplifies the benefits of objective-driven curricula, as transfer via shared weights is sensitive to the ordering chosen.
Regret and Factored Curricula: In factored state MDPs, curricula can focus exploration on variables of highest regret (those most likely to degrade performance when varied), allowing the agent to efficiently acquire robust policies under distributional shifts (Panayiotou et al., 13 Sep 2024).
Trajectory-Valued Curricula in Offline RL: In CLTV (Abolfazli et al., 2 Feb 2025), transitions are valued according to similarity-to-target-policy (KL-divergence), and high-value trajectories form the training curriculum, guiding offline RL to focus on target-like behaviors even when data are mixed across domains.
Skill and Diversity-Oriented Curricula: Recent approaches such as "trajectory-first" (Braun et al., 2 Jun 2025) first search the trajectory space for diverse, near-optimal behaviors (via constrained novelty search), then use off-policy RL to distill these into parameter-sharing policies, maintaining diversity throughout.

4. Empirical Results and Comparative Performance

Parameter-sharing policies, when paired with effective curriculum learning, consistently achieve three outcomes:

Increased Sample Efficiency: In CASSL, an 8%-14% boost in generalization and grasp accuracy (on novel objects) over staged and random baselines is reported, attributable to both sensitivity-guided curricula and the shared feature backbone (Murali et al., 2017).
Improved Generalization and Long-Term Robustness: In distributed multi-robot policies, a trajectory-length curriculum (short-to-long) results in higher coordination accuracy and lower position errors, especially as team size and measurement noise increase (Roche et al., 29 Sep 2025). For changing environments, factored representations combined with targeted curricula yield robust policies able to generalize over stochastic environment compositions (Panayiotou et al., 13 Sep 2024).
Faster Convergence and Curriculum-Dependent Transfer: In multi-agent RL and StarCraft micromanagement, transfer learning with curricula (from simpler to harder multi-agent scenarios) allows shared-parameter policies to reach convergence in fewer episodes and with higher win rates than end-to-end or non-curriculum approaches (Shao et al., 2018).
Diversity and Avoidance of Mode Collapse: In diversity optimization, a trajectory-first curriculum enables parameter-sharing policies to realize significantly higher skill diversity than skill-encoding-only or random-exploration baselines, as measured by visitation statistics and state entropy (Braun et al., 2 Jun 2025).
Mitigation of Domain Mismatch in Offline RL: Transition scoring and curriculum trajectory selection (CLTV) achieve up to 95% normalized score gains (Ant, CQL) versus best competing approaches, by filtering for high-quality, target-like experience (Abolfazli et al., 2 Feb 2025).

5. Theoretical Insights and Transfer Dynamics

The effectiveness of parameter sharing depends on the compatibility between tasks (or control dimensions), the quality of shared representations, and the structure of the curriculum. Theoretical analyses provide several guarantees and heuristics:

Sharing-Curriculum Coupling: The CMDP and optimization frameworks demonstrate that when transfer is performed via parameter sharing—either as direct value function transfer or as composition in policy space—the ordering of tasks (the curriculum) strongly influences convergence speed and end-task performance (Narvekar et al., 2018, Foglino et al., 2019). Parameter-sharing facilitates effective transfer especially when curricula are tailored to minimize regret or maximize "jumpstart".
Sensitivity Analysis for Curriculum Ordering: Quantifying first- and total-order sensitivities in the control space allows curriculum designs that learn independent (or least interactive) aspects first, reducing interference in parameter-sharing networks.
Curricula for Heterogeneity: In multi-agent parameter sharing, agent indication (observation tagging and output padding) ensures convergence to differentiated policies even in heterogeneous spaces and enables staged increases in agent/task diversity, supporting curriculum learning over agent population complexity (Terry et al., 2020).
Regularization and Catastrophic Forgetting: Parameter sharing across curricula aids in knowledge retention but poses interference risks; modularity (as in PaCo) and curriculum selection (e.g., via transfer cost spanning trees (Zentner et al., 2021)) help mitigate catastrophic forgetting and ill-posed transfer, sometimes resulting in "hard-first" curricula.
Recurrent Curriculum and Adaptation: In soft parameter-sharing CNNs, recurrent patterns discovered via curriculum learning enable efficient adaptation to tasks of increasing complexity, suggesting an architectural bias beneficial in non-stationary environments (Savarese et al., 2019).

6. Practical Applications and Case Studies

The integration of parameter sharing and curriculum learning underpins recent advances in:

Robotic Grasping and Manipulation: Efficient training in high-dimensional spaces, generalization to novel objects, and reduction in labeling costs, using factored curricula and shared feature extractors (Murali et al., 2017, Kang et al., 2021).
Multi-Agent and Distributed Control: Scalable distributed controllers for swarms and teams, robust to partial observability and sensor noise, learned by curriculum imitation of global demonstrations and local perception estimation (Roche et al., 29 Sep 2025).
Neural Architecture Search and Model Selection: Curriculum learning over sharing extent enables efficient supernet training and improved architecture ranking, demonstrating that the granularity of parameter sharing can form an effective curriculum variable (Zhou et al., 2022).
Offline RL and Data-Driven Generalization: Curriculum trajectory valuation and transition scoring frameworks order large, mixed datasets by relevance, providing shared initialization and fine-tuned updates for general policies in varying or sparse data domains (Abolfazli et al., 2 Feb 2025, Liu et al., 2021).
Imitation and Diversity Learning: Mixture-of-experts and trajectory-first curricula allow efficient parameter sharing while capturing multimodal skill demonstrations, avoiding degeneracy and increasing behavioral repertoire (Blessing et al., 2023, Braun et al., 2 Jun 2025).

7. Outlook and Future Research Directions

Key open directions in parameter sharing and curriculum learning include:

Dynamic, Adaptive Curriculum Policies: Beyond static or sensitivity-based orderings, research is moving toward learned curriculum policies that adapt on-the-fly to agent progress, observed data complexities, and performance metrics (Narvekar et al., 2018, Portelas et al., 2020).
Hierarchical and Compositional Parameter Sharing: Dynamically compositional frameworks (e.g., PaCo, DynaShare) and meta-learned architectures may further increase the flexibility and efficacy of policy representations, with potential for real-time curriculum adaptation (Sun et al., 2022, Rahimian et al., 2023).
Curricula for Robustness and Non-Stationarity: Methods emphasizing regret-based variable selection, factored state representations, and adversarial or latent curriculum generation (e.g., CLUTR (Azad et al., 2022)) are critical for agents in dynamic, multi-modal, or non-stationary environments.
Scale and Generalization Across Domains and Agents: As tasks, domains, and agent teams scale, the theory and empirics suggest that parameter sharing complemented by curriculum learning becomes essential—not only for learning speed and generalization but also for controlling catastrophic forgetting and efficiently acquiring diverse skills.
Integration With Transfer and Lifelong Learning: Parameter sharing forms a backbone for transfer and continual learning, particularly where skill policies or subnetworks are gradually extended or adapted by curriculum selection, as in minimum-spanning-tree curriculum ordering or with accumulated skill libraries (Zentner et al., 2021).

A plausible implication for future work is that advances will focus on automated curriculum synthesis (potentially meta-learned), adaptive parameter sharing strategies, and principled mechanisms for identifying curriculum variables (e.g., via sensitivity, regret, or dynamics mismatch) best suited to the structure of the domain, agent population, or task acquisition pipeline.