Cross-Dataset Hyperparameter Transfer

Updated 30 December 2025

The paper demonstrates that leveraging ensemble surrogate models and Bayesian optimization can cut target evaluations by up to 5×.
Cross-dataset hyperparameter transfer is defined as reusing empirical hyperparameter mappings across datasets to boost search efficiency and model performance.
Practical methods include meta-feature extraction, surrogate alignment, and portfolio selection to robustly address multi-source covariate shift and continual learning.

Cross-dataset hyperparameter transfer denotes the practice of leveraging hyperparameter optimization or search results from related datasets to accelerate, warm-start, or otherwise improve hyperparameter selection on a new dataset. This paradigm is fundamental across transfer learning, AutoML, and meta-learning, wherein prior computational effort or empirical knowledge is adaptively reused or encoded. Research spans principled transfer in Bayesian optimization, neural/surrogate-based alignment, copula models, kernel embeddings, meta-feature-driven surrogates, combinatorial portfolio selection, multi-source covariate shift, and ordered optimization in continual settings.

1. Problem Formulation and Conceptual Foundations

Cross-dataset hyperparameter transfer is rigorously defined in terms of transferring knowledge (surrogate models, priors, optimal configurations, meta-features, or empirical mappings) from a set of “source” tasks (datasets) to a new “target” task. The core problem is to estimate, for the target dataset $D^t$ , the mapping from hyperparameter configuration $x \in \mathcal{X}$ to objective value $y = f^t(x)$ , using available data $\{(x^{s}_i, y^{s}_i)\}_{i,s}$ from other datasets.

A simple formalization, as in ensemble Bayesian optimization (Feurer et al., 2018), is:

Given $T$ source tasks, each with evaluations $\{\mathcal{D}^{(s)}\}_{s=1}^T$ , learn a surrogate or set of priors $M^s$ .
For a new target, combine $M^s$ with an online-updated $M^t$ to form transfer recommendations or surrogates driving acquisition functions.

Within multi-objective or multi-fidelity settings (cf. (Terragni et al., 2022, Winkelmolen et al., 2020)), the objective may further include coherence, diversity, or multi-task performance metrics, and transfer solutions may be structured as portfolios, embedding-based surrogates, or ensembles.

2. Bayesian Optimization and Surrogate-Based Transfer

Bayesian optimization (BO) underpins much of the technical literature (e.g., (Feurer et al., 2018, Law et al., 2018, Li et al., 2022, Salinas et al., 2019, Hellan et al., 2023)). Transfer is enacted either by:

Constructing ensemble surrogates, such as the ranking-weighted Gaussian process ensemble (RGPE) (Feurer et al., 2018), which aggregates GP predictions from each source task weighted by performance on the target.
Building kernel-augmented GPs, e.g., embedding datasets into an RKHS via mean-embedding representations and conditioning the BO surrogate on both hyperparameters and data-embedding (Law et al., 2018), or using learned meta-feature extractions (Jomaa et al., 2021).
Implementing parametric or semi-parametric copula models mapping source empirical quantiles to a latent normal, allowing robust pooling across tasks of different scales and variances (Salinas et al., 2019).

For task-weighting, RGPE uses ranking statistics and bootstrap aggregation, while TransBO optimizes source and target combination weights jointly through supervised ranking losses and cross-validation (Li et al., 2022).

A general BO loop incorporating transfer typically: initializes with transferred priors or configurations, fits joint or ensemble surrogates, proposes new hyperparameter candidates via acquisition maximization (e.g., Expected Improvement), and updates models with target-evaluated results (Feurer et al., 2018).

3. Surrogate Alignment and Nonparametric Mapping Approaches

Methods such as surrogate alignment (HTS) (Ilievski et al., 2016) learn a direct nonlinear mapping $g:\mathcal{X}\to\mathcal{X}$ from source to target hyperparameter optima. HTS leverages surrogate models (radial-basis function regressors with polynomial tail) of error landscapes from source and target, and trains a neural network to align them by minimizing rank correlation loss between surrogate predictions. This method is effective for DNNs where exact modeling is impractical due to high training cost, and does not require meta-features—only $(x, f(x))$ pairs from both domains.

Experiments confirm substantial reductions (3–5×) in required target evaluations compared to non-transfer surrogates (e.g., HORD).

4. Meta-Feature and Kernel Embedding Techniques

Recent approaches employ learned dataset representations or meta-features to enable cross-dataset transfer even across heterogeneous domains (Jomaa et al., 2021, Law et al., 2018). In DMFBS (Jomaa et al., 2021), a differentiable deep-set architecture extracts embeddings from each dataset, which are jointly optimized for response regression, manifold regularization that encourages similarity across surrogate predictions for similar datasets, and a dataset-identification auxiliary task. The output embedding is concatenated with hyperparameter configuration to condition the surrogate predictor, driving acquisition and ranking for new dataset evaluation.

Distributional transfer (Law et al., 2018) utilizes kernel mean embedding of distributions $P_{X,Y}$ into RKHS, and places a GP prior over $(\theta, \psi(D), s)$ , facilitating transfer by relating hyperparameter performance across similar datasets in feature-space.

5. Portfolio Selection and Zero-Shot Cross-Dataset Transfer

Zero-shot HPO approaches such as (Winkelmolen et al., 2020, Rijn et al., 2017), and (Terragni et al., 2022) demonstrate that a small portfolio of hyperparameter configurations can "cover" a large set of future datasets: for any unseen dataset, at least one configuration performs near-optimally.

Portfolio selection algorithms solve a combinatorial, submodular minimization of mean regret over meta-datasets, employing greedy augmentation and surrogate modeling or multi-fidelity evaluations. This yields lookup tables of recommended default configurations.

Table: Portfolio Construction Summary

Method	Portfolio Construction	Evaluation Approach
Greedy Submodular (Winkelmolen et al., 2020)	Greedy K-set minimizing mean meta-loss	Direct empirical or surrogate-based
Surrogate Adaptive Query (Winkelmolen et al., 2020)	Surrogate model learned over $(d,\theta)$	Bayesian optimization on meta-table
Multi-Objective BO (Terragni et al., 2022)	Pareto front for coherence/diversity/classification	Multi-output random scalarization

Empirical evidence attests that such portfolios, constructed from hundreds of datasets and tens of thousands of configurations, allow practitioners to skip or dramatically reduce hyperparameter search on new datasets by evaluating only the top-K recommended options.

6. Task and Dataset Similarity: Ordered and Distributional Transfer

OTHPO (Hellan et al., 2023) introduces ordered transfer for sequential tasks (e.g., increasing data sizes, continual learning), positing that recent tasks are more strongly correlated. The approach models transfer via GP surrogates over the joint space of hyperparameters and an ordered context variable (e.g., time, index, data fraction). Warm-start heuristics select best configurations from immediately prior tasks, attesting to improved "first-evaluation" regret.

Distributional embeddings or meta-feature kernels generalize this approach for unordered, meta-feature-equipped cross-dataset scenarios (Law et al., 2018, Jomaa et al., 2021, Terragni et al., 2022).

7. Practical Guidelines and Empirical Impact

Effective cross-dataset transfer depends on principled weighting of sources, careful model aggregation, attention to distributional shift, and adaptive regularization:

Always validate transferred configurations or priors on a held-out split to account for outliers, semantic shift, or domain distance (Dube et al., 2018).
Graduated (layer-wise) hyperparameter schedules and regression-based prediction of transfer scales yield superior results in deep networks (Dube et al., 2018).
Embedding-based or copula-normalized surrogates avoid pitfalls from raw value pooling across tasks with disparate objective scales (Salinas et al., 2019).
Variance-minimizing importance weighting is critical under multi-source covariate shift (Nomura et al., 2020).

Empirical studies consistently report 2–10× reductions in search time, substantial improvements in accuracy or regret over random search, and resilience to outlier tasks and distributional noise.

Cross-dataset hyperparameter transfer remains a cornerstone enabling scalable, sample-efficient, and adaptive automated model tuning in contemporary machine learning pipelines.

Markdown Upgrade to Chat

References (12)

Practical Transfer Learning for Bayesian Optimization (2018)

One Configuration to Rule Them All? Towards Hyperparameter Transfer in Topic Models using Multi-Objective Bayesian Optimization (2022)

Practical and sample efficient zero-shot HPO (2020)

Hyperparameter Learning via Distributional Transfer (2018)

TransBO: Hyperparameter Optimization via Two-Phase Transfer Learning (2022)

A Quantile-based Approach for Hyperparameter Transfer Learning (2019)

Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation (2023)

Hyperparameter Optimization with Differentiable Metafeatures (2021)

Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training (2016)

10.

Hyperparameter Importance Across Datasets (2017)

11.

Improving Transferability of Deep Neural Networks (2018)

12.

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-Dataset Hyperparameter Transfer.

Cross-Dataset Hyperparameter Transfer

1. Problem Formulation and Conceptual Foundations

2. Bayesian Optimization and Surrogate-Based Transfer

3. Surrogate Alignment and Nonparametric Mapping Approaches

4. Meta-Feature and Kernel Embedding Techniques

5. Portfolio Selection and Zero-Shot Cross-Dataset Transfer

6. Task and Dataset Similarity: Ordered and Distributional Transfer

7. Practical Guidelines and Empirical Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Cross-Dataset Hyperparameter Transfer

1. Problem Formulation and Conceptual Foundations

2. Bayesian Optimization and Surrogate-Based Transfer

3. Surrogate Alignment and Nonparametric Mapping Approaches

4. Meta-Feature and Kernel Embedding Techniques

5. Portfolio Selection and Zero-Shot Cross-Dataset Transfer

6. Task and Dataset Similarity: Ordered and Distributional Transfer

7. Practical Guidelines and Empirical Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research