Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 130 tok/s
Gemini 3.0 Pro 29 tok/s Pro
Gemini 2.5 Flash 145 tok/s Pro
Kimi K2 191 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Disentangled Multi-Context Meta-Learning: Unlocking robust and Generalized Task Learning (2509.01297v1)

Published 1 Sep 2025 in cs.RO

Abstract: In meta-learning and its downstream tasks, many methods rely on implicit adaptation to task variations, where multiple factors are mixed together in a single entangled representation. This makes it difficult to interpret which factors drive performance and can hinder generalization. In this work, we introduce a disentangled multi-context meta-learning framework that explicitly assigns each task factor to a distinct context vector. By decoupling these variations, our approach improves robustness through deeper task understanding and enhances generalization by enabling context vector sharing across tasks with shared factors. We evaluate our approach in two domains. First, on a sinusoidal regression task, our model outperforms baselines on out-of-distribution tasks and generalizes to unseen sine functions by sharing context vectors associated with shared amplitudes or phase shifts. Second, in a quadruped robot locomotion task, we disentangle the robot-specific properties and the characteristics of the terrain in the robot dynamics model. By transferring disentangled context vectors acquired from the dynamics model into reinforcement learning, the resulting policy achieves improved robustness under out-of-distribution conditions, surpassing the baselines that rely on a single unified context. Furthermore, by effectively sharing context, our model enables successful sim-to-real policy transfer to challenging terrains with out-of-distribution robot-specific properties, using just 20 seconds of real data from flat terrain, a result not achievable with single-task adaptation.

Summary

  • The paper presents DMCM which disentangles task-specific factors into separate context vectors, enabling selective and robust adaptation.
  • It employs a controlled update mechanism during inner-loop adaptation, significantly enhancing zero-shot generalization in tasks like sine regression.
  • DMCM demonstrates superior sim-to-real transfer in quadrupedal robot locomotion, achieving high performance under out-of-distribution conditions.

Disentangled Multi-Context Meta-Learning: Robust and Generalized Task Adaptation

Introduction and Motivation

The paper introduces Disentangled Multi-Context Meta-Learning (DMCM), a meta-learning framework designed to address the limitations of conventional gradient-based meta-learning (GBML) methods, such as MAML and CAVIA, which typically entangle multiple task factors into a single context representation. This entanglement impedes interpretability, generalization, and robustness, especially in domains with compositional task variations, such as robotics and regression tasks. DMCM explicitly assigns each factor of variation to a distinct context vector, enabling selective adaptation and context sharing across tasks with overlapping factors. Figure 1

Figure 1: DMCM adapts by disentangling task-specific factors into separate context vectors, which can be reused across tasks with overlapping factors, enabling generalization.

DMCM Framework and Algorithmic Details

DMCM extends the CAVIA meta-learning paradigm by introducing KK context vectors {ϕ1,,ϕK}\{\phi^1, \dots, \phi^K\}, each corresponding to a distinct task factor (e.g., amplitude, phase, terrain, robot-specific property). During inner-loop adaptation, only the context vector associated with the changed factor is updated, while others remain fixed. This selective update is orchestrated via regulated data sequencing and explicit task labeling, ensuring that each context vector specializes in its assigned factor.

The outer loop meta-gradient step updates shared model parameters θ\theta after a warm-up phase, allowing context vectors to accumulate meaningful information. An optional recombination loop enables zero-shot generalization by updating θ\theta using context vectors that were not jointly adapted, thus encouraging the model to operate effectively with independently learned contexts.

Pseudocode Overview

1
2
3
4
5
for task in batch:
    for s in range(K):
        # Only update context vector phi^s for the changed factor
        phi_i[s] -= alpha * grad_phi^s(Loss(f(phi_i^1, ..., phi_i^K, theta), D_train))
theta -= beta * grad_theta(Loss(f(phi_i^1, ..., phi_i^K, theta), D_test))

Sine Regression: Robustness and Zero-Shot Generalization

DMCM is evaluated on the sine regression benchmark, where tasks are parameterized by amplitude and phase shift. The model disentangles these factors into separate context vectors, enabling robust adaptation and generalization under out-of-distribution (OOD) conditions.

Empirical results demonstrate that DMCM maintains low loss and variance even when 40–80% of amplitude-phase combinations are excluded during training, outperforming MAML, CAVIA, and ANIL in OOD settings. Notably, DMCM supports zero-shot prediction by recombining context vectors learned from different tasks, achieving accurate predictions for unseen sine functions without further adaptation. Figure 2

Figure 2

Figure 2: DMCM learning curve for sine regression with recombination loop, showing robust adaptation and zero-shot generalization.

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Zero-shot predictions from DMCM with two contexts; red dots indicate predictions using shared context vectors.

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Zero-shot predictions from DMCM with three contexts (amplitude, phase-shift, y-shift), demonstrating compositional generalization.

The number of context vectors is critical: aligning the context count with the true number of task factors yields optimal robustness, while excessive contexts slow adaptation and reduce stability.

Quadrupedal Robot Locomotion: Sim-to-Real Transfer and Robustness

DMCM is applied to quadrupedal robot locomotion, where tasks vary along terrain and robot-specific properties (e.g., payload, control gains). The dynamics model is trained to predict robot states under diverse conditions, with context vectors disentangling terrain and robot-specific factors. Figure 5

Figure 5: DMCM learning procedure for quadrupedal robot locomotion, illustrating context extraction and transfer to RL policy.

Figure 6

Figure 6

Figure 6

Figure 6: Evaluation of DMCM context separation on real-world datasets; context-aligned vectors yield lower loss than unrelated contexts.

Context vectors extracted from the dynamics model are transferred to reinforcement learning (RL) policies. Three policies are compared: vanilla (no context), single-CAVIA (unified context), and multi-DMCM (disentangled contexts). DMCM consistently achieves superior robustness and success rates under OOD terrain and robot-specific property conditions, including payloads and control gains outside the training distribution.

Strong numerical results include:

  • 80% success rate for DMCM in real-world stair climbing under OOD conditions (reduced KpK_p gain, added payload), using only 20 seconds of flat-terrain data for adaptation.
  • Vanilla and single-CAVIA policies fail completely under the same OOD conditions.
  • DMCM maintains high reward and lifespan under increasing payloads, while other policies degrade sharply. Figure 7

Figure 7

Figure 7: Dynamics model loss for CAVIA, DMCM with self-adaptation, and DMCM with recombination (zero-shot with shared context vectors).

Figure 8

Figure 8

Figure 8: Highest level stair terrain used for OOD evaluation.

Figure 9

Figure 9: Go1 robot with water bottles attached at three legs, demonstrating DMCM robustness under asymmetric payloads.

Implementation Considerations and Trade-offs

Computational Requirements

  • DMCM introduces additional memory overhead for storing KK context vectors and recombination loop buffers.
  • Training time per meta-gradient decreases with more context vectors (due to smaller update sizes), but adaptation time increases as each context must be updated separately.
  • For K=3K=3, recombination loop requires six extra context vectors in memory.

Hyperparameter Selection

  • The number of context vectors should match the true number of task factors for optimal robustness.
  • Context vector dimensionality must be sufficient to encode the assigned factor; underparameterization impedes expressiveness, while overparameterization reduces stability.

Deployment Strategies

  • DMCM supports context sharing for zero-shot adaptation, enabling rapid deployment to novel tasks with overlapping factors.
  • Real-world deployment requires pre-collected data for context extraction; integrating fast, online adaptation remains an open challenge.

Limitations and Future Directions

  • Manual context labeling is required; automating context discovery is a key area for future research.
  • DMCM has been validated on regression and RL tasks; extension to classification and other domains is needed.
  • Real-time adaptation with selective context updates is not yet fully realized.

Theoretical and Practical Implications

DMCM advances the interpretability and compositionality of meta-learning by disentangling task factors, enabling robust adaptation and generalization in complex, multi-factor environments. The framework is particularly suited for domains with compositional structure, such as robotics, where sim-to-real transfer and OOD robustness are critical. The explicit separation of context vectors facilitates analysis of task variation and supports modular policy design.

Conclusion

Disentangled Multi-Context Meta-Learning (DMCM) provides a principled approach to robust and generalized task adaptation by learning multiple context vectors, each corresponding to a distinct factor of variation. Empirical results in sine regression and quadrupedal robot locomotion demonstrate superior OOD robustness, zero-shot generalization, and sim-to-real transfer capabilities compared to entangled context baselines. DMCM's compositional structure and context sharing mechanisms offer significant advantages for interpretable, scalable, and modular meta-learning, with promising implications for future research in automatic context discovery, broader domain applicability, and real-time adaptation.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com