Fractional Transfer Learning (FTL)
- Fractional Transfer Learning is a method that blends pretrained network weights with random initialization using a tunable fraction ω to control knowledge transfer.
- It employs a convex combination mechanism that balances retaining useful prior information while avoiding negative transfer in new tasks.
- Empirical studies in deep model-based RL with Dreamer demonstrate improved sample efficiency and asymptotic returns on various continuous-control tasks.
Fractional Transfer Learning (FTL) is a method for parameter-based transfer learning that mixes a fraction of a source network’s pretrained weights with a random initialization to seed learning in a new task. Rather than defaulting to full parameter transfer () or pure random initialization (), FTL “blends” source parameters with a tunable fractional coefficient . This approach enables explicit control over the amount of knowledge reused from prior tasks, mitigating information loss from randomization while avoiding negative transfer that can occur from indiscriminate full reuse. FTL has been specifically evaluated in the context of deep model-based reinforcement learning using the Dreamer algorithm, demonstrating substantial improvements in sample efficiency and learning performance across multi-source visual continuous-control tasks (Sasso et al., 2021).
1. Formal Definition and Mechanism
Fractional Transfer Learning operates by initializing target network layers as a convex combination of a source layer’s pretrained weights and a new random initialization. If denotes pretrained source weights, a freshly generated random tensor of equal shape, and the transfer fraction, the FTL initialization is: This formulation preserves the statistical properties of random initialization (when ) and exact parameter reuse (when ), while enabling intermediate mixtures that allow retained knowledge to be adaptively tuned. The technique is directly compatible with standard initialization schemes such as Glorot or Kaiming for .
2. Motivation: Balancing Knowledge Retention and Flexibility
Traditional parameter transfer strategies in neural networks, particularly in reinforcement learning (RL), often treat transfer as all-or-nothing: parameters are either fully reused or entirely discarded. This dichotomy leads to two primary drawbacks:
- Loss of Useful Information: Pure randomization discards all structures acquired by , eliminating potential accelerants for early-stage optimization, especially where partial feature reuse is beneficial.
- Overfitting and Interference: Full transfer of parameters (notably in output layers such as reward and value heads) can codify task-specific biases. Adaptation to new, divergent reward functions or value structures may then be hindered by the optimizer’s need to “unlearn” this bias.
FTL provides a principled compromise, allowing selective preservation of prior knowledge proportional to , thereby aiding sample efficiency and offering a safeguard against interference from incompatible representations. Empirically, FTL has been shown to enhance both initial “jumpstart” performance and asymptotic returns in tasks with shared partial structure (Sasso et al., 2021).
3. Application within Dreamer and Component-wise Strategy
Dreamer comprises (i) a variational encoder/decoder (CNN-based VAE), (ii) a recurrent state-space model (RSSM) for dynamics, (iii) a reward predictor, (iv) an actor network, and (v) a value network. Integration of FTL into Dreamer proceeds on a per-layer, per-component basis, guided by task and architectural compatibility:
| Component | Transfer Strategy | Rationale |
|---|---|---|
| Encoder/Decoder CNNs (VAE) | Full transfer () | Latent representations likely generalize across related visual tasks |
| RSSM transition model | Full transfer () | Core dynamics benefit from reuse when physical laws are similar |
| Reward model (last layer) | Fractional () | Reward mapping is task-dependent; blend preserves flexibility |
| Value model (last layer) | Fractional () | Value function head sensitive to new reward structure; blend advisable |
| Preceding layers (reward/model) | Full transfer () | Shared “feature extraction” layers are more generalizable |
| Actor last layer, input-to-RSSM | Pure random () | Task dimension misalignment requires random initialization |
All parameters are made fully trainable post-initialization; FTL does not enforce any freezing. The initialization and training sequence for FTL-Dreamer is detailed in Algorithm 1 of (Sasso et al., 2021).
4. Hyperparameterization of the Fractional Coefficient
The fractional transfer coefficient is treated as a hyperparameter, analogous to learning rate or dropout rate. In the referenced experiments, is set globally within each head: . Grid search over allows empirical tuning. Selection is component-specific but global within a head; all last-layer parameters within a given head use the same . The recommended operating range is .
Potential extensions include layer-wise or adaptive (e.g., via meta-learning or sensitivity analysis), which could further mitigate negative transfer and optimize knowledge reuse.
5. Experimental Protocol and Baseline Comparison
Empirical evaluation encompasses six PyBullet continuous-control tasks with visual inputs: HalfCheetah, Hopper, Walker2D, Ant, InvertedPendulum, and InvertedDoublePendulum. Multi-source transfer is performed: a Dreamer agent is pretrained jointly on two, three, or four source tasks. Transfer to a target task occurs as per the FTL initialization (Algorithm 1), and the agent is trained for steps.
Baselines are:
- DREAMER-Scratch: Random initialization, identical architecture and hyperparameters.
- DREAMER-RandInitLast: Identical to FTL, but last reward/value layers are purely random ().
Performance is assessed by episode return at early stages (jumpstart), mean return over steps, and mean return in the final steps (asymptotic).
6. Empirical Findings
The following tables summarize the effect of FTL ( with two sources) compared to DREAMER-Scratch. Reported values are episode returns (mean ± standard deviation):
Table 1: Average Episode Return (over steps)
| Task | FTL (sources=2, =0.2) | Baseline Scratch |
|---|---|---|
| HalfCheetah | ||
| Hopper | ||
| Walker2D | ||
| InvertedPendulum | ||
| InvertedDoublePendulum | ||
| Ant |
Table 2: Asymptotic Episode Return (final steps)
| Task | FTL (sources=2, =0.2) | Baseline Scratch |
|---|---|---|
| HalfCheetah | ||
| Hopper | ||
| Walker2D | ||
| InvertedPendulum | ||
| InvertedDoublePendulum | ||
| Ant |
FTL yields substantial gains in both overall and asymptotic performance for HalfCheetah, Hopper, Walker2D, and Pendula tasks. Negative transfer is observed in Ant, consistent with prior knowledge that transfer is sensitive to degree of task similarity. The random-init last-layer condition often improves over naive scratch but is consistently inferior to the fractional strategy (Sasso et al., 2021).
7. Limitations and Prospective Directions
Several limitations and potential extensions emerge:
- Negative Transfer: With highly dissimilar tasks, e.g., transferring to Ant from locomotion sources, even fractional reuse can reduce performance.
- Static Fraction Assignment: A single global per head may not capture optimal transfer for all layers or target-task combinations. Adaptive or layer-wise strategies could further reduce harmful transfer.
- Dynamics Model Transfer: The current implementation fully transfers the RSSM; however, selective or fractional transfer for dynamics weights may be beneficial for tasks with differing physical structure.
- Broader Applicability: While demonstrated in multi-source model-based RL, FTL’s underlying mechanism is applicable to single-source settings and may generalize to supervised learning transfer, motivating further study.
Fractional Transfer Learning provides a pragmatic and effective mechanism for leveraging partial task similarity in neural network-based RL, positioned between complete reuse and full re-randomization, with measurable benefits in data efficiency and policy quality across representative continuous-control environments (Sasso et al., 2021).