Transfer Learning without Dataset-Specific Parameters

Updated 20 January 2026

Transfer learning without dataset-specific parameters is a paradigm that leverages meta-learning and global model components to enable cross-domain transfer without custom tuning.
It employs techniques like bias correction, universal metric learning with frozen backbones, and surrogate-based hyperparameter transfer to achieve competitive performance.
This approach minimizes computational overhead and memory demands while delivering robust results in scenarios such as class-incremental learning and heterogeneous domain adaptation.

Transfer learning without dataset-specific parameters refers to a family of approaches in which models, hyperparameters, or optimization procedures can be reused or applied across datasets or domains without requiring custom parameter tuning, adaptation, or retraining for each new target dataset. This paradigm eliminates or minimizes the need for memory, validation data, or per-dataset hyperparameters, and aims at scalable, robust, and deployable transfer—especially in settings where dataset-specific adjustment is computationally or practically infeasible.

1. Key Concepts and Frameworks

Transfer learning typically assumes either (i) shared model weights or representation between source and target, (ii) transfer of meta-learned surrogates or hyperparameters, or (iii) construction of architectures or ensembles whose selection and prediction mechanisms generalize across datasets without any dataset-specific fine-tuning or validation.

Representative frameworks include:

Bias correction parameter transfer for class-incremental learning: Prediction bias correction parameters are meta-learned on reference datasets with small validation splittings, then transferred in a fixed fashion to new target datasets without any memory or target-side adaptation (Slim et al., 2021).
Parameter-efficient universal metric learning: A large frozen backbone with a small set of lightweight, domain-conditional adapters and prompts, parameterized globally rather than per-dataset, allow effective transfer learning across heterogeneous distributions without training separate models (Kim et al., 2023).
Hyperparameter transfer using meta-learned surrogates: Surrogate models for performance and cost are meta-learned on past datasets and then applied directly to new tasks using only meta-features, avoiding any dataset-specific black-box optimization (Strangmann et al., 2024).
Ensemble DNN methods for user-agnostic BCI decoding: A globally pre-trained, participant-fine-tuned DNN ensemble predicts for new users via instance-based selection and fusion, requiring no calibration or user-specific weight update (Guney et al., 2022).
Hyperparameter-free Bayesian optimization ensembles: Base surrogate models trained on many datasets are combined via agnostic ensemble weighting without dataset-specific meta-features or manual task weighting (Feurer et al., 2018).
Partial parameter reuse in high-dimensional learning: Theoretical analysis delineates in which regimes inherited subsets of parameters directly generalize, without per-dataset adaptation (Yuan et al., 26 Sep 2025), and in $\ell_1$ -regularized regression, sharp asymptotics establish practical rules for ignoring certain hyperparameters entirely across datasets (Okajima et al., 2024).

2. Theoretical and Algorithmic Foundations

Bias Correction Transfer

For class-incremental learning, bias correction parameters $(\alpha_s^k, \beta_s^k)$ are learned offline using reference datasets with small, held-out validation memory. After learning, these parameters are directly transferred to new tasks, used only at inference. For each state $s$ and group $k$ : $\widetilde o_s^k = \alpha_s^k o_s^k + \beta_s^k \mathbf{1}$

$q_s = \mathrm{softmax}([\widetilde o_s^1; \ldots; \widetilde o_s^s])$

The method does not require any further optimization, tuning, or memory on the target dataset. The parameters can be averaged across reference datasets and the overhead is negligible. This approach demonstrably delivers up to +16 Top-1 points on class-incremental benchmarks versus untuned baselines (Slim et al., 2021).

Universal Metric Learning without Dataset-Specific Fine-Tuning

The Parameter-efficient Unified Metric leArning (PUMA) framework freezes all backbone weights and introduces two global, lightweight adaptation mechanisms—stochastic adapters (layerwise, domain-agnostic) and a prompt pool, which is dynamically selected per instance via meta-features. No per-dataset parameters or side networks are required. Training is joint across all data. At inference, prompt selection and Bernoulli adapter activation are data-driven but parameterized globally, not per dataset. PUMA achieves unified retrieval embedding across eight diverse datasets, surpassing both per-dataset and universal full fine-tuned baselines with only 11.5% as many trainable weights (Kim et al., 2023).

Surrogate and Hyperparameter Transfer

Meta-learning of performance and cost surrogates enables selection of finetuning configurations (hyperparameters, optimization strategies) on new datasets solely via meta-features, without running any hyperparameter sweep or dataset-specific Bayesian optimization. A pair of frozen Gaussian process surrogates, fit to past runs, yields acquisition functions of the form: $\theta^* = \arg\max_{\theta \in \Theta} \mathcal{A}(s_{\mathrm{perf}}(\theta, \phi_D), s_{\mathrm{cost}}(\theta, \phi_D))$ This method can outperform adaptive per-dataset Bayesian optimization for LLM finetuning on synthetic QA tasks and eliminates inner loops of hyperparameter adaptation (Strangmann et al., 2024).

3. Empirical Performance and Comparative Analyses

Representative Empirical Results

Method/Domain	Datasets	Transfer Approach	Key Result (Representative)
adBiC+transfer for class-incremental learning	CIFAR-100, etc.	Fixed bias corrections	+1 to +16 Top-1 points vs. raw method (Slim et al., 2021)
PUMA universal metric learning	8 retrieval datasets	Frozen ViT + adapters	+3–7 pp unified Recall@1 over full FT baselines (Kim et al., 2023)
Quick-Tune (transfer-only) for LLMs	8 synthetic QA sets	Fixed meta-learned surrogates	Top accuracy, 0.63 ± 0.004, no target BO (Strangmann et al., 2024)
Ensemble-DNN SSVEP classifier	2 EEG datasets	Pool of user-finetuned DNNs	155.51 bits/min ITR, SOTA, no calibration (Guney et al., 2022)
Transfer Lasso with fixed hyperparameters	IMDb, MNIST	Set Δλ = 0 (or κ = 0)	≤2% generalization gap to grid opt (Okajima et al., 2024)

Parameter overhead is typically minimal or negligible. Empirically, approaches that eliminate per-dataset adaptation achieve strong or state-of-the-art performance (e.g., +7.0 pp on Aircraft dataset for metric learning, highest ITR in EEG BCI setting).

Negative Transfer Scenarios

Theory indicates that such transfer can be deleterious when the shared signal between source and target is small compared to task-specific structure. In these regimes, e.g., if $∥u∥$ is small or if class supports have little overlap, the transferred parameters may amplify noise and degrade generalization relative to training from scratch (Yuan et al., 26 Sep 2025).

4. Strategies for Parameter-Free and Hyperparameter-Free Transfer

Meta-learning and global parameterization: Averaging or jointly optimizing parameters on reference datasets before transfer.
Surrogate-driven selection: Learning surrogates on past tasks and applying the learned mappings or ensembles to new datasets without any adjustment or per-task validation.
Instance-based ensemble selection and dynamic voting: Instance-adaptive but parameter-set-invariant combination rules for model ensembles, relying on pre-computed similarity measures or uncertainty proxies rather than retraining (Guney et al., 2022).
Sharp asymptotic analysis for hyperparameter tuning reduction: Theoretical results allow practitioners to drop some transfer-related tuning knobs (e.g., $\Delta\lambda$ ) while incurring at most 2–10% generalization penalty (Okajima et al., 2024).

A plausible implication is that if either a universal mapping, meta-learned acquisition rule, or robust measure of signal overlap is known, most per-dataset hyperparameters and re-training procedures become superfluous.

5. Limitations, Open Problems, and Robustness Guarantees

Approaches that forego dataset-specific parameters assume that either the source and target are sufficiently well aligned or that the meta-learned or pooled statistics generalize. If a new dataset is out-of-distribution relative to the meta-dataset, transfer-only surrogates may underperform. In universal metric learning, fixed globally-parameterized adapters and prompts may not capture idiosyncratic domain features for rare or small datasets (Strangmann et al., 2024, Kim et al., 2023).

Several works provide theoretical "no-harm" guarantees: worst-case regret bounds in BO ensembles converge at most a multiplicative slowdown compared to vanilla per-task optimization (Feurer et al., 2018). In high-dimensional regression, fixing certain parameters yields generalization error within $1+o(1)$ of the optimal full search (Okajima et al., 2024).

Extending zero-parameter transfer to streaming, open-domain, or adversarial shift remains open. Hybrid schemes, where transfer-only initialization is permitted and a minimal number of fine-tuning steps are performed post hoc, can further improve robustness and fairness (Strangmann et al., 2024).

6. Practical and Domain-Specific Instantiations

Class-incremental learning: adBiC parameters are computed offline and transferred for all future tasks without memory or validation.
Metric learning across domains: PUMA demonstrates robust, high-performing embeddings across eight domains with zero per-domain fine-tuning.
Hyperparameter search for model selection: Both meta-learned surrogates and hyperparameter-free ensemble schemes eliminate any need for dataset-specific feature engineering or adaptation in Bayesian optimization (Strangmann et al., 2024, Feurer et al., 2018, Ilievski et al., 2016).
Regression with structured sparsity transfer: Trans-Lasso and Pretraining Lasso can be safely simplified to single-hyperparameter variants globally across tasks (Okajima et al., 2024).

This clinical, meta-free approach is applicable where run-time adaptation is precluded or undesirable, or where new tasks must be handled with no labeled data or validation budget.

In summary, transfer learning without dataset-specific parameters is enabled by a combination of meta-learning, theoretical analysis, and parameter-efficient architecture design. It achieves robust performance across domains and tasks, often matching or surpassing adaptation-based methods in both efficiency and accuracy, so long as universal or sufficiently generalizable signals are present in the transfer base (Slim et al., 2021, Kim et al., 2023, Strangmann et al., 2024, Okajima et al., 2024, Yuan et al., 26 Sep 2025).

Markdown Upgrade to Chat

References (8)

Dataset Knowledge Transfer for Class-Incremental Learning without Memory (2021)

Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning (2023)

Transfer Learning for Finetuning Large Language Models (2024)

Transfer Learning of an Ensemble of DNNs for SSVEP BCI Spellers without User-Specific Training (2022)

Practical Transfer Learning for Bayesian Optimization (2018)

Towards Understanding Feature Learning in Parameter Transfer (2025)

Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter Selection Strategy based on Sharp Asymptotic Analysis (2024)

Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Transfer Learning without Dataset-Specific Parameters.

Transfer Learning without Dataset-Specific Parameters

1. Key Concepts and Frameworks

2. Theoretical and Algorithmic Foundations

Bias Correction Transfer

Universal Metric Learning without Dataset-Specific Fine-Tuning

Surrogate and Hyperparameter Transfer

3. Empirical Performance and Comparative Analyses

Representative Empirical Results

Negative Transfer Scenarios

4. Strategies for Parameter-Free and Hyperparameter-Free Transfer

5. Limitations, Open Problems, and Robustness Guarantees

6. Practical and Domain-Specific Instantiations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Transfer Learning without Dataset-Specific Parameters

1. Key Concepts and Frameworks

2. Theoretical and Algorithmic Foundations

Bias Correction Transfer

Universal Metric Learning without Dataset-Specific Fine-Tuning

Surrogate and Hyperparameter Transfer

3. Empirical Performance and Comparative Analyses

Representative Empirical Results

Negative Transfer Scenarios

4. Strategies for Parameter-Free and Hyperparameter-Free Transfer

5. Limitations, Open Problems, and Robustness Guarantees

6. Practical and Domain-Specific Instantiations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research