Partially Linear Regression (PLR)

Updated 9 April 2026

Partially linear regression is a semiparametric framework that models some covariates linearly and others nonparametrically for greater flexibility.
It employs profile-kernel estimation, penalized regression, and machine learning techniques to achieve minimax optimal rates in high-dimensional contexts.
Its extensions and robust inference methods facilitate applications in genomics, economics, and time series analysis, demonstrating superior empirical performance.

Partially linear regression (PLR) is a central semiparametric modeling framework that combines a linear structure for some covariates with a nonparametric form for others. It provides interpretable effects for select predictors while retaining flexibility to accommodate complex nuisance or smooth effects, and has proven foundational in high-dimensional statistics, robust inference, modern penalization regimes, and semiparametric theory.

1. Model Definition and Variants

The canonical PLR model observes data tuples $(Y_i, X_i, Z_i)$ , where

$Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$

with:

$X_i \in \mathbb{R}^p$ : linear covariates, $\beta \in \mathbb{R}^p$ (often sparse/high-dimensional),
$g(\cdot)$ : unknown, typically smooth, nonparametric function (on $\mathbb{R}^q$ , often $q=1$ ),
$\epsilon_i$ : mean-zero errors, often sub-Gaussian or having a specified dependence structure.

PLR generalizes several classic models and admits numerous extensions:

High-dimensional PLR: $p \gg n$ , requiring regularization on $\beta$ (LASSO, SCAD, Elastic Net, etc.) (Lee et al., 2024 Li et al., 2015).
Panel and time series PLR: individual (fixed/random effects) or temporal dependence, with $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 0 and $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 1 allowed to vary across units or time (Liu et al., 2019 Li et al., 2022).
Partially linear additive models (PLAMs): sum of multiple univariate $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 2, some selected as linear (Boente et al., 2021 Martínez, 18 Feb 2025).
Latent factor–adjusted PLR: explicit modeling of factor structure within the high-dimensional covariates (Shi et al., 11 Jan 2025).
Semi-functional PLR: allows a Hilbert-space valued covariate with an unknown functional-linear/nonlinear effect (Feng et al., 2022).

2. Estimation and Computational Strategies

PLR estimation typically aims for minimax-optimal error rates and feasibly scalable algorithms even for $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 3. Methods fall into several main regimes:

(a) Robinson’s Profile–Kernel Estimator:

Residualizes both response and linear covariates via nonparametric regression on $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 4; regresses residuals to estimate $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 5, then refines $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 6, yielding $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 7-consistency for $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 8, and optimal nonparametric rates for $Y_i = X_i^T \beta + g(Z_i) + \epsilon_i,$ 9 (Li et al., 2022 Cui et al., 2014).

(b) Penalized/Machine Learning Procedures:

$X_i \in \mathbb{R}^p$ 0–penalized (LASSO), Elastic Net, or folded-concave (SCAD/MCP) penalties enforce sparsity/group selection on $X_i \in \mathbb{R}^p$ 1 (Lee et al., 2024 Li et al., 2015 Martínez, 18 Feb 2025).
B-spline, spline, or trend filtering (with TV $X_i \in \mathbb{R}^p$ 2 penalties) for $X_i \in \mathbb{R}^p$ 3, with doubly-penalized least squares (PLTF), able to adapt to variable smoothness (Lee et al., 2024).
ML-based “outsourcing”: estimation of $X_i \in \mathbb{R}^p$ 4 using arbitrary machine learning fits (random forest, boosting, deep nets), with sample-splitting/cross-fitting to avoid inference bias (Shi et al., 2023).

(c) Specialized Algorithms:

Block-coordinate descent (LASSO + univariate trend filtering per-iteration); efficient for high-dimensional settings (Lee et al., 2024).
IRLS and MM-algorithms for robust, redescending $X_i \in \mathbb{R}^p$ 5-losses in the presence of outliers (Martínez, 18 Feb 2025 Boente et al., 2021).

(d) Factor Adjustment and Principal Components:

In high-dimensional $X_i \in \mathbb{R}^p$ 6 with latent structure, factor estimation (PCA) and projection techniques debias inference and separate sparse effects from dense correlation (Shi et al., 11 Jan 2025).

3. Asymptotic Theory and Inference

PLR supports rigorous minimax-optimal rates and a rich theory for semiparametric inference:

Euclidean–Functional Rate Separation: $X_i \in \mathbb{R}^p$ 7 achieves $X_i \in \mathbb{R}^p$ 8-consistency (LASSO/sparse-oracle rate if $X_i \in \mathbb{R}^p$ 9), and $\beta \in \mathbb{R}^p$ 0 achieves nonparametric rates of $\beta \in \mathbb{R}^p$ 1 under regularity and correct penalty tuning (Lee et al., 20241311.26282212.10359).
Asymptotic Independence: The parametric and nonparametric estimators are asymptotically independent under mild conditions, simplifying joint confidence regions and likelihood-ratio testing (Cheng et al., 2013).
Oracle Properties and Selection Consistency: Adaptive penalties (adaptive LASSO, SCAD, etc.) combined with robustification yield support recovery and asymptotic normality for nonzero $\beta \in \mathbb{R}^p$ 2 (Martínez, 18 Feb 2025).

Simultaneous Inference & Testing:

High-dimensional Gaussian multiplier bootstrap and debiasing techniques provide valid simultaneous CIs for $\beta \in \mathbb{R}^p$ 3 and $\beta \in \mathbb{R}^p$ 4, even with temporal/complex dependence (Li et al., 2022 Shi et al., 11 Jan 2025).
Likelihood ratio tests in joint (semi)nonparametric models produce Wilks-type limits, with independent chi-square mixing for parametric and nonparametric contributions (Cheng et al., 2013).
Linear vs additive structure can be identified via solution-path approaches and folded-concave penalties in panel data (Liu et al., 2019).
Ultra-high-dimensional testing possible using ML-estimated $\beta \in \mathbb{R}^p$ 5, quadratic-form and power-enhanced statistics for global and sparse alternatives (Shi et al., 2023).

4. Robustness, Regularization, and Practical Implementation

PLR estimation must address contamination and leverage effects, as least squares can be highly sensitive to outliers:

Robust $\beta \in \mathbb{R}^p$ 6-functions: Huber, Tukey’s bisquare, and other redescending loss functions deliver bounded-influence M- or MM-type estimators for both the parametric and nonparametric parts (Boente et al., 2021 Martínez, 18 Feb 2025).
Penalization: SCAD, MCP, Elastic Net, and Adaptive LASSO control selection, shrinkage, and group effects—critical in correlated/high-dimensional $\beta \in \mathbb{R}^p$ 7 (Li et al., 2015 Martínez, 18 Feb 2025).
Trend Filtering vs Splines: Trend filtering via TV penalties delivers locally adaptive recovery of $\beta \in \mathbb{R}^p$ 8 with heterogeneous smoothness (e.g., kinks, flat/rough regions) compared to standard smoothing splines, which can oversmooth at boundaries or singularities (Lee et al., 2024).

Table: Penalized Approaches in High-Dimensional PLR

Method	Linear Penalty	Nonparametric Penalty
LASSO–Splines	$\beta \in \mathbb{R}^p$ 9 (LASSO)	B-spline (ridge/group)
Elastic Net	$g(\cdot)$ 0	Spline/ridge
Trend Filtering	$g(\cdot)$ 1 (LASSO)	TV ( $g(\cdot)$ 2)
SCAD/Adaptive LASSO	Folded-concave/Weighted	Group SCAD

PLTF (partial linear trend filtering) achieves computational feasibility ( $g(\cdot)$ 3 per BCD iteration) and automatic adaptation to either sparse or nonparametric optimal rates.

5. Extensions: Partial Additivity, Factors, Functional Covariates

PLR admits several modern extensions, each with bespoke estimation and inferential strategies:

PLAMs: Model $g(\cdot)$ 4 with simultaneous selection over the linear/additive regime, enabling identification of additive, linear, or hybrid effect structures (Martínez, 18 Feb 2025 Boente et al., 2021).
PLR with Latent Factors: Factor-Adjusted PLR integrates low-rank and sparse effects in high-dimensional regimes; B-spline/penalized estimation with P.C. adjustment attains minimax rates, and debiased tests provide valid inference under dense covariance (Shi et al., 11 Jan 2025).
Panel and Time Series PLR: Incorporate fixed effects, autocorrelation, summary measures, and multi-way dependencies; simultaneous inference bands for $g(\cdot)$ 5 via high-dimensional Gaussian approximation methods take into account both the nonparametric and dependent structure (Li et al., 2022, Liu et al., 2019).
Semi-Functional PLR: Models where $g(\cdot)$ 6 is infinite-dimensional (e.g., a curve), and $g(\cdot)$ 7 may be tested for linearity using projection-based KS/CvM tests, calibrated by wild bootstrap (Feng et al., 2022).

6. Applications and Empirical Performance

PLR, PLTF, and their modern variants have demonstrated empirical utility across fields:

High-dimensional -omics: Identifying sparse metabolomics/proteomics features associated with continuous outcomes, as in the IDATA study with $g(\cdot)$ 8, where PLTF consistently outperformed PLSS and LASSO on test-MSEs and biomarker variable selection (Lee et al., 2024).
Robust inference under contamination: Robust adaptive penalized estimators are less affected by both vertical and leverage outliers, retaining variable selection accuracy and function estimation stability under model contamination or heavy tails (Martínez, 18 Feb 2025, Boente et al., 2021).
Panel economics: Pathwise linearity detection in aggregate production and environmental Kuznets curve data reveals the set of linear and nonlinear economic relationships, with consistent recovery as predicted by theory (Liu et al., 2019).
Genomics and gene expression: Ultra-high-dimensional PLR tests, factor adjustment, and power-enhanced statistics enable principled inference even when $g(\cdot)$ 9, with empirical superiority over de-sparsified Lasso and classical approaches (Shi et al., 2023 Shi et al., 11 Jan 2025).
Functional data: SFPLR linearity tests have been shown to detect or fail to reject linear effects as appropriate in benchmark spectroscopy and weather station datasets (Feng et al., 2022).

7. Theoretical Innovations and Limitations

Rate Adaptivity and Minimaxity: Modern PLR estimators adapt to unknown smoothness and sparsity without knowing in advance whether the problem is “parametric-rate–dominated” or “nonparametric-rate–dominated” (Lee et al., 2024). The estimator tracks the larger of $\mathbb{R}^q$ 0 and the minimax nonparametric rate.
Optimality under Heterogeneous Smoothness: Trend filtering (PLTF) attains lower bias and avoids boundary over-/undersmoothing endemic in $\mathbb{R}^q$ 1-penalized (spline) methods, especially at “kinks” or locally nonsmooth features (Lee et al., 2024).
Oracle and Semi-Nonparametric Wilks Phenomena: Likelihood-based tests divide the limiting chi-square law into independent contributions from the parametric and nonparametric part (Cheng et al., 2013).
Limitations: Classical approaches require smoothness (Sobolev) for $\mathbb{R}^q$ 2. PLR’s extension to shape-constrained or cube-root rate problems (monotonicity, convexity) remains non-trivial (Cheng et al., 2013). Further, identification and optimality may require sub-Gaussian tails, RE conditions, or precise penalty calibration.

Partially linear regression thus provides a unified and powerful framework for simultaneous sparse parametric estimation, nonparametric function recovery, and modular incorporation of robust, high-dimensional, time-dependent, or structured inferential challenges (Lee et al., 2024 Martínez, 18 Feb 2025 Shi et al., 11 Jan 2025 Li et al., 2022 Shi et al., 2023 Li et al., 2015 Liu et al., 2019).