Papers
Topics
Authors
Recent
Search
2000 character limit reached

Transfer Learning without Dataset-Specific Parameters

Updated 20 January 2026
  • Transfer learning without dataset-specific parameters is a paradigm that leverages meta-learning and global model components to enable cross-domain transfer without custom tuning.
  • It employs techniques like bias correction, universal metric learning with frozen backbones, and surrogate-based hyperparameter transfer to achieve competitive performance.
  • This approach minimizes computational overhead and memory demands while delivering robust results in scenarios such as class-incremental learning and heterogeneous domain adaptation.

Transfer learning without dataset-specific parameters refers to a family of approaches in which models, hyperparameters, or optimization procedures can be reused or applied across datasets or domains without requiring custom parameter tuning, adaptation, or retraining for each new target dataset. This paradigm eliminates or minimizes the need for memory, validation data, or per-dataset hyperparameters, and aims at scalable, robust, and deployable transfer—especially in settings where dataset-specific adjustment is computationally or practically infeasible.

1. Key Concepts and Frameworks

Transfer learning typically assumes either (i) shared model weights or representation between source and target, (ii) transfer of meta-learned surrogates or hyperparameters, or (iii) construction of architectures or ensembles whose selection and prediction mechanisms generalize across datasets without any dataset-specific fine-tuning or validation.

Representative frameworks include:

  • Bias correction parameter transfer for class-incremental learning: Prediction bias correction parameters are meta-learned on reference datasets with small validation splittings, then transferred in a fixed fashion to new target datasets without any memory or target-side adaptation (Slim et al., 2021).
  • Parameter-efficient universal metric learning: A large frozen backbone with a small set of lightweight, domain-conditional adapters and prompts, parameterized globally rather than per-dataset, allow effective transfer learning across heterogeneous distributions without training separate models (Kim et al., 2023).
  • Hyperparameter transfer using meta-learned surrogates: Surrogate models for performance and cost are meta-learned on past datasets and then applied directly to new tasks using only meta-features, avoiding any dataset-specific black-box optimization (Strangmann et al., 2024).
  • Ensemble DNN methods for user-agnostic BCI decoding: A globally pre-trained, participant-fine-tuned DNN ensemble predicts for new users via instance-based selection and fusion, requiring no calibration or user-specific weight update (Guney et al., 2022).
  • Hyperparameter-free Bayesian optimization ensembles: Base surrogate models trained on many datasets are combined via agnostic ensemble weighting without dataset-specific meta-features or manual task weighting (Feurer et al., 2018).
  • Partial parameter reuse in high-dimensional learning: Theoretical analysis delineates in which regimes inherited subsets of parameters directly generalize, without per-dataset adaptation (Yuan et al., 26 Sep 2025), and in 1\ell_1-regularized regression, sharp asymptotics establish practical rules for ignoring certain hyperparameters entirely across datasets (Okajima et al., 2024).

2. Theoretical and Algorithmic Foundations

Bias Correction Transfer

For class-incremental learning, bias correction parameters (αsk,βsk)(\alpha_s^k, \beta_s^k) are learned offline using reference datasets with small, held-out validation memory. After learning, these parameters are directly transferred to new tasks, used only at inference. For each state ss and group kk: o~sk=αskosk+βsk1\widetilde o_s^k = \alpha_s^k o_s^k + \beta_s^k \mathbf{1}

qs=softmax([o~s1;;o~ss])q_s = \mathrm{softmax}([\widetilde o_s^1; \ldots; \widetilde o_s^s])

The method does not require any further optimization, tuning, or memory on the target dataset. The parameters can be averaged across reference datasets and the overhead is negligible. This approach demonstrably delivers up to +16 Top-1 points on class-incremental benchmarks versus untuned baselines (Slim et al., 2021).

Universal Metric Learning without Dataset-Specific Fine-Tuning

The Parameter-efficient Unified Metric leArning (PUMA) framework freezes all backbone weights and introduces two global, lightweight adaptation mechanisms—stochastic adapters (layerwise, domain-agnostic) and a prompt pool, which is dynamically selected per instance via meta-features. No per-dataset parameters or side networks are required. Training is joint across all data. At inference, prompt selection and Bernoulli adapter activation are data-driven but parameterized globally, not per dataset. PUMA achieves unified retrieval embedding across eight diverse datasets, surpassing both per-dataset and universal full fine-tuned baselines with only 11.5% as many trainable weights (Kim et al., 2023).

Surrogate and Hyperparameter Transfer

Meta-learning of performance and cost surrogates enables selection of finetuning configurations (hyperparameters, optimization strategies) on new datasets solely via meta-features, without running any hyperparameter sweep or dataset-specific Bayesian optimization. A pair of frozen Gaussian process surrogates, fit to past runs, yields acquisition functions of the form: θ=argmaxθΘA(sperf(θ,ϕD),scost(θ,ϕD))\theta^* = \arg\max_{\theta \in \Theta} \mathcal{A}(s_{\mathrm{perf}}(\theta, \phi_D), s_{\mathrm{cost}}(\theta, \phi_D)) This method can outperform adaptive per-dataset Bayesian optimization for LLM finetuning on synthetic QA tasks and eliminates inner loops of hyperparameter adaptation (Strangmann et al., 2024).

3. Empirical Performance and Comparative Analyses

Representative Empirical Results

Method/Domain Datasets Transfer Approach Key Result (Representative)
adBiC+transfer for class-incremental learning CIFAR-100, etc. Fixed bias corrections +1 to +16 Top-1 points vs. raw method (Slim et al., 2021)
PUMA universal metric learning 8 retrieval datasets Frozen ViT + adapters +3–7 pp unified Recall@1 over full FT baselines (Kim et al., 2023)
Quick-Tune (transfer-only) for LLMs 8 synthetic QA sets Fixed meta-learned surrogates Top accuracy, 0.63 ± 0.004, no target BO (Strangmann et al., 2024)
Ensemble-DNN SSVEP classifier 2 EEG datasets Pool of user-finetuned DNNs 155.51 bits/min ITR, SOTA, no calibration (Guney et al., 2022)
Transfer Lasso with fixed hyperparameters IMDb, MNIST Set Δλ = 0 (or κ = 0) ≤2% generalization gap to grid opt (Okajima et al., 2024)

Parameter overhead is typically minimal or negligible. Empirically, approaches that eliminate per-dataset adaptation achieve strong or state-of-the-art performance (e.g., +7.0 pp on Aircraft dataset for metric learning, highest ITR in EEG BCI setting).

Negative Transfer Scenarios

Theory indicates that such transfer can be deleterious when the shared signal between source and target is small compared to task-specific structure. In these regimes, e.g., if u∥u∥ is small or if class supports have little overlap, the transferred parameters may amplify noise and degrade generalization relative to training from scratch (Yuan et al., 26 Sep 2025).

4. Strategies for Parameter-Free and Hyperparameter-Free Transfer

  • Meta-learning and global parameterization: Averaging or jointly optimizing parameters on reference datasets before transfer.
  • Surrogate-driven selection: Learning surrogates on past tasks and applying the learned mappings or ensembles to new datasets without any adjustment or per-task validation.
  • Instance-based ensemble selection and dynamic voting: Instance-adaptive but parameter-set-invariant combination rules for model ensembles, relying on pre-computed similarity measures or uncertainty proxies rather than retraining (Guney et al., 2022).
  • Sharp asymptotic analysis for hyperparameter tuning reduction: Theoretical results allow practitioners to drop some transfer-related tuning knobs (e.g., Δλ\Delta\lambda) while incurring at most 2–10% generalization penalty (Okajima et al., 2024).

A plausible implication is that if either a universal mapping, meta-learned acquisition rule, or robust measure of signal overlap is known, most per-dataset hyperparameters and re-training procedures become superfluous.

5. Limitations, Open Problems, and Robustness Guarantees

Approaches that forego dataset-specific parameters assume that either the source and target are sufficiently well aligned or that the meta-learned or pooled statistics generalize. If a new dataset is out-of-distribution relative to the meta-dataset, transfer-only surrogates may underperform. In universal metric learning, fixed globally-parameterized adapters and prompts may not capture idiosyncratic domain features for rare or small datasets (Strangmann et al., 2024, Kim et al., 2023).

Several works provide theoretical "no-harm" guarantees: worst-case regret bounds in BO ensembles converge at most a multiplicative slowdown compared to vanilla per-task optimization (Feurer et al., 2018). In high-dimensional regression, fixing certain parameters yields generalization error within $1+o(1)$ of the optimal full search (Okajima et al., 2024).

Extending zero-parameter transfer to streaming, open-domain, or adversarial shift remains open. Hybrid schemes, where transfer-only initialization is permitted and a minimal number of fine-tuning steps are performed post hoc, can further improve robustness and fairness (Strangmann et al., 2024).

6. Practical and Domain-Specific Instantiations

  • Class-incremental learning: adBiC parameters are computed offline and transferred for all future tasks without memory or validation.
  • Metric learning across domains: PUMA demonstrates robust, high-performing embeddings across eight domains with zero per-domain fine-tuning.
  • Hyperparameter search for model selection: Both meta-learned surrogates and hyperparameter-free ensemble schemes eliminate any need for dataset-specific feature engineering or adaptation in Bayesian optimization (Strangmann et al., 2024, Feurer et al., 2018, Ilievski et al., 2016).
  • Regression with structured sparsity transfer: Trans-Lasso and Pretraining Lasso can be safely simplified to single-hyperparameter variants globally across tasks (Okajima et al., 2024).

This clinical, meta-free approach is applicable where run-time adaptation is precluded or undesirable, or where new tasks must be handled with no labeled data or validation budget.


In summary, transfer learning without dataset-specific parameters is enabled by a combination of meta-learning, theoretical analysis, and parameter-efficient architecture design. It achieves robust performance across domains and tasks, often matching or surpassing adaptation-based methods in both efficiency and accuracy, so long as universal or sufficiently generalizable signals are present in the transfer base (Slim et al., 2021, Kim et al., 2023, Strangmann et al., 2024, Okajima et al., 2024, Yuan et al., 26 Sep 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Transfer Learning without Dataset-Specific Parameters.