Reluctant Transfer Learning for ITR Estimation

Updated 18 November 2025

RTL is a selective model adaptation strategy that updates source coefficients only when target data provides sufficient evidence of a local shift.
It employs a sparse shift penalty through adaptive Lasso to maintain a parsimonious and interpretable individualized treatment rule model.
Empirical evaluations demonstrate RTL’s efficiency and accuracy in achieving near-optimal outcomes with minimal false inclusions in precision medicine settings.

Reluctant Transfer Learning (RTL) is a model adaptation strategy designed for individualized treatment rule (ITR) estimation under effect heterogeneity, with a primary focus on precision medicine applications. The central objective is to enable efficient, interpretable transfer of model knowledge from a source dataset to a related, yet partially shifted, target scenario—without requiring access to individual-level source data. RTL implements the principle of "reluctant modeling," permitting changes to the source model only when target data provides sufficient evidence for a shift, yielding a parsimonious, generalizable adapted model (Oh et al., 11 Nov 2025).

1. Objective and Motivation

The estimation of ITRs aims to maximize the expected outcome $V(\pi) = E[Y \mid \pi(O)]$ for individual patient covariates $O$ by mapping to a treatment action $a$ . In translational settings, well-trained source models may become suboptimal due to local shifts in treatment-covariate effects in a new target population, introducing effect heterogeneity. Reluctant Transfer Learning addresses the challenge of updating ITR models by adaptively incorporating only those modifications to the source model parameters that demonstrably improve predictive performance in the target population. This approach is distinct from standard transfer learning or domain adaptation in that it enforces selectivity and parsimony, reducing the risk of overfitting and unnecessary model complexity via an explicit sparse-shift penalty (Oh et al., 11 Nov 2025).

2. Mathematical Foundation

Let $(Y_s, O_s, A_s)$ denote the source data (size $n_s$ ), $(Y_t, O_t, A_t)$ the target data (size $n_t$ , denoted $n$ when unambiguous), and $\Phi(O, A) \in \mathbb{R}^p$ a feature vector including main covariates and interactions. The source regression is $Q_s(\Phi) = \Phi^\top \beta^*_s$ , with fitted coefficients $\hat{\beta}_s$ . The true target regression is $Q_t(\Phi) = \Phi^\top \beta^*_t$ , where $\beta^*_t = \beta^*_s + \theta^*$ .

The estimation procedure involves:

Pre-trained Source Estimator: Obtain a root- $(n_s/p)$ -consistent estimator $\hat{\beta}_s$ (using Lasso, elastic net, etc.).
Pseudo-outcome Construction: Compute $\tilde{Y}_{ti} = Y_{ti} - \Phi_{ti}^\top \hat{\beta}_s$ , for $i = 1\ldots n$ .
Sparse-Shift Estimation on Target: Minimize the penalized objective over $\theta \in \mathbb{R}^p$ :

$L_n(\theta) = n E_n\left[(\tilde{Y}_t - \Phi_t^\top \theta)^2\right] + \lambda_n \sum_{j=1}^p w_j |\theta_j|$

where $E_n$ denotes the empirical mean on the target, $\lambda_n > 0$ is a regularization parameter, and $w_j$ are coordinate weights (often from adaptive Lasso).

Model Recovery and ITR Generation: Set $\hat{\beta}_t = \hat{\beta}_s + \hat{\theta}$ and define $\hat{\pi}_t(o) \in \arg\max_{a} \Phi(o, a)^\top \hat{\beta}_t$ .

The equivalent form in the original outcome scale is:

$\min_{\beta_t} n E_n[(Y_t - \Phi_t^\top \beta_t)^2] + \lambda_n \sum_{j=1}^p w_j |\beta_{tj} - \hat{\beta}_{sj}|$

The $\ell_1$ penalty on $\theta$ operationalizes reluctance by encouraging sparsity, so only a subset of source coefficients are adapted to local target evidence.

3. Stepwise Algorithmic Procedure

The RTL procedure is as follows:

Input: Source coefficients $\hat{\beta}_s$ , target data $\{(Y_{ti}, \Phi_{ti})\}_{i=1}^n$ , penalty weights $w$ , penalty grid $\Lambda$ .
Pseudo-outcome Generation: $\tilde{Y}_{ti} = Y_{ti} - \Phi_{ti}^\top \hat{\beta}_s$ .
Shift Estimation: For each $\lambda \in \Lambda$ , solve:

$\hat{\theta}^{(\lambda)} = \arg\min_\theta \left[n E_n (\tilde{Y}_t - \Phi_t^\top \theta)^2 + \lambda \sum_j w_j |\theta_j|\right]$

Typically fitted with coordinate descent (e.g., glmnet).

Parameter Selection: Identify $\hat{\lambda}$ minimizing $K$ -fold cross-validation error on the target pseudo-outcomes.
Final Model: $\hat{\theta} = \hat{\theta}^{(\hat{\lambda})}$ , $\hat{\beta}_t = \hat{\beta}_s + \hat{\theta}$ .
Final ITR: $\hat{\pi}_t(o) = \arg\max_{a} \Phi(o, a)^\top \hat{\beta}_t$ .

Key characteristics of RTL include selective transfer (local adaptation of coefficients), no requirement for individual-level source data post $\hat{\beta}_s$ , and computational efficiency (one $\ell_1$ -penalized regression on the target dataset) (Oh et al., 11 Nov 2025).

4. Theoretical Properties and Guarantees

RTL presumes the following for target data:

(A1) $Y_t = \Phi_t^\top \beta^*_t + \epsilon_t$ , $E[\epsilon_t | \Phi_t] = 0$ , $Var(\epsilon_t) < \infty$ .
(A2) $\|\Phi_t\|_\infty \leq M < \infty$ .
(A3) $Cov(\Phi_t)$ has eigenvalues in $[b, B]$ .
(A4) $\|\hat{\Sigma} - \Sigma\|_F \to_p 0$ , $\hat{\Sigma} = E_n[\Phi_t \Phi_t^\top]$ .
(A5) $\lambda_n = o(\sqrt{n})$ .

Assuming also a margin condition (Qian & Murphy, 2011), namely that for some $C > 0$ , $\eta \geq 0$ , and all $\epsilon > 0$ :

$P(\Delta(Q_t^*; O_t) \leq \epsilon) \leq C \epsilon^\eta,$

where $\Delta(Q_t^*; o)$ denotes the gap between optimal and second-best actions.

Theorem (Value Convergence): Under the above, the difference between the maximal attainable value and the estimated value satisfies:

$V(\pi_t^o) - V(\hat{\pi}_t) = O_p\left[(p / \min(n_s, n_t))^{(1+\eta)/(2+\eta)}\right].$

The proof proceeds by first establishing an $\ell_2$ -error bound on the estimated coefficients and then applying a result of Qian & Murphy (2011) to translate this to a value bound (Oh et al., 11 Nov 2025).

5. Penalty Design, Tuning, and Computation

Source Estimator: Typically Lasso or elastic net is used for source model fitting.
Penalty Weights: Adaptive Lasso weights $w_j = 1/|\tilde{\theta}_j|^\gamma$ for $\gamma \in [0.5, 1]$ , with $\tilde{\theta}_j$ from an initial pilot fit (e.g., elastic net). For structured transfer, group or sparse group-Lasso enables block-wise adaptation.
Hyperparameter Optimization: $K$ -fold cross-validation (typically $K=3$ or $5$) on the target selects $\lambda$ . $\Lambda$ is a geometrically spaced grid from $\lambda_{\max}$ (yielding zero shift) to $\lambda_{\min} = \epsilon \lambda_{\max}$ (e.g., $\epsilon \approx 10^{-4}$ ).
Computational Complexity: Dominated by coordinate descent over $n_t$ samples, $p$ predictors, and $|\Lambda|$ values, with typical cost $O(|\Lambda| n_t p \mathrm{Iter})$ and $\mathrm{Iter} \approx 100$ per penalty value (Oh et al., 11 Nov 2025).

6. Empirical Evaluation and Findings

Extensive simulation studies establish that RTL nearly attains the optimal value across diverse effect heterogeneity patterns. Signal structures (weak-dense and sparse) and shift scenarios (main and interaction effect shifts) were analyzed. For example, in a weak-dense scenario with $n_s = 50$ source samples and a shift in two main+interaction coefficients, values achieved (test set, scenario I) were:

Method	Value	C (True Zeros)	IC (False Inclusions)	RMSE
Optimal	6.61	–	–	–
RTL	6.47	10	0	1.49
TargOnly	5.96	–	–	–
ITL	4.05	–	–	–
TransLasso	6.30	–	–	–

RTL consistently maintained low false inclusion rate (IC ≈ 0), achieved nearly optimal test set value, and produced the lowest or comparable RMSE across all signal/shift scenarios.

In the Best Apnea Interventions for Research (BestAIR) trial, treating Site 3 ( $n_s = 68$ ) as source and Site 1 ( $n_t = 34$ ) as target, and using change in Epworth Sleepiness Scale as outcome, RTL yielded the highest estimated value (3.22) with only two shifted parameters, compared to alternatives that either failed to adapt effectively or yielded much larger, less interpretable policy sizes (Oh et al., 11 Nov 2025).

7. Practical Considerations, Limitations, and Extensions

RTL requires no sharing of individual-level source data post $\hat{\beta}_s$ , which facilitates data use in regulated settings and simplifies deployment. Nonzero $\hat{\theta}_j$ directly indicate which effects are locally shifted, enhancing interpretability.

Limitations include:

Assumption of identical covariate and treatment space (no new target-only covariates).
Restriction to single-stage (static) ITRs; multi-stage dynamic regimes are not addressed.
No direct adaptation to situations where target introduces new, unmatched covariates.

Potential Extensions include:

Group penalties for multi-armed or hierarchically structured models.
Online or sequential updating as new target data becomes available.
Kernelized or neural analogues of RTL for nonparametric effect heterogeneity.
Incorporation of multiple sources with separate shift vectors per source (Oh et al., 11 Nov 2025).

Reluctant Transfer Learning substantiates a selective, evidence-driven paradigm for model transfer in high-stakes, high-dimensional ITR estimation, balancing efficiency, interpretability, and adaptability within evolving data environments.

PDF Markdown Chat (Pro)

References (1)

Reluctant Transfer Learning in Penalized Regressions for Individualized Treatment Rules under Effect Heterogeneity (2025)

Follow Topic

Get notified by email when new papers are published related to Reluctant Transfer Learning (RTL).