Spline-based Adaptive Regression Trees (SMART)

Updated 28 March 2026

SMART is a hybrid regression framework that fuses decision-tree partitioning with localized MARS fits to model both discontinuous and smooth structures.
The methodology employs a global MARS forward pass, recursive tree splitting, and leaf-wise pruning using GCV to optimize fit and reduce error.
Empirical evaluations show SMART delivers lower RMSE on complex datasets compared to standard MARS and other regression techniques by effectively isolating regime changes.

Spline-based Multivariate Adaptive Regression Trees (SMART) are a regression methodology that combines decision-tree-based recursive partitioning with localized Multivariate Adaptive Regression Splines (MARS) modeling. SMART is designed to address the challenge where classical decision trees exhibit high variance for smooth structure, and MARS, while effective at modeling continuous nonlinearities and interactions, fails to effectively segment discontinuities. In SMART, feature-space partitioning isolates discontinuous subregions, upon which separate MARS fits capture the piecewise continuous relationships. This explicit disentangling of discontinuity detection and smooth modeling leverages the strengths of both frameworks, yielding a model well-suited for regression contexts with heterogeneous smoothness and abrupt regime changes (Pattie et al., 2024).

1. Model Construction and Mathematical Specification

SMART produces a piecewise-defined function $f(X)$ , segmented by a binary tree whose leaves correspond to subregions with locally fitted MARS models. Given $n$ observations $\{(X_i,Y_i)\}_{i=1}^n$ with $X_i \in \mathbb{R}^m$ :

The tree tessellates $\mathbb{R}^m$ into $R$ disjoint, axis-aligned hyperrectangles $\{\mathcal{R}_r\}_{r=1}^R$ .
In each $\mathcal{R}_r$ , a local MARS expansion of up to $J_r$ terms is fit:

$f_r(X) = \beta_{r0} + \sum_{j=1}^{J_r} \beta_{rj} B_{rj}(X)$

where $B_{rj}(X)$ are MARS basis functions: (1) linear, (2) hinge terms of the form $(x_k - t)_+$ or $(t - x_k)_+$ , or (3) products of previously selected basis functions.

Globally, the SMART predictor is

$f(X) = \sum_{r=1}^R I\{X \in \mathcal{R}_r\} f_r(X)$

Coefficient estimation in each leaf is via ordinary least squares (OLS), with knot placement and term selection by MARS’s greedy forward algorithm to minimize residual sum of squares (RSS). Pruning (backward step) is guided by minimization of the Generalized Cross-Validation (GCV) risk:

$\mathrm{GCV}(\lambda) = \frac{\sum_i (Y_i - f(X_i))^2}{(1 - M(\lambda)/n)^2}$

2. Training Algorithm and Computational Workflow

The SMART training pipeline consists of three sequential phases:

A. Global MARS Forward Pass

Initialize with the intercept-only model.
Iteratively, add the basis function (hinge, reflected-hinge, or interaction up to degree $D_\text{mars}$ ) that most reduces the RSS, or minimizes GCV, until $M$ terms are selected or no significant improvement remains.

B. Recursive Tree-Split Phase

For each node in the tree:
1. Terminate if data size $< n_\text{min}$ or parameters $> |D|-1$ .
2. Partition the node’s data into fitting (70%) and validation (30%) subsets.
3. For each feature and split value, refit MARS models left and right, compute validation RSS, and select the split that minimizes it.
4. Use 5-fold cross-validated RSS to decide if the split yields sufficient improvement ( $\geq \delta$ threshold, typically 1% relative reduction). If so, recurse; otherwise, designate as a leaf.

Algorithm 1 (Best Split Finder, continuous features):

Input: D^F=(X^F,Y^F), D^V=(X^V,Y^V), current fit f^(0)
For d in 1..m:
  For s in unique(X^F_{·,d}):
    Partition at (d,s)
    Fit MARS left/right on L^F, R^F
    Calculate RSS_val = RSS(f_L;L^V) + RSS(f_R;R^V)
    Keep (d,s) with minimal RSS_val
Return (d*,s*, f_L*, f_R*)

C. Leaf-wise MARS Pruning

Perform backward elimination of terms in each leafwise MARS fit, removing terms which least increase GCV until further removal would hurt overall validation performance.

Algorithm 2 (Pruning):

1
2
3

For each leaf r:
   Prune-MARS(f_r)
Return pruned leaves

3. Algorithmic Complexity and Hyperparameters

Complexity
- MARS forward step (per leaf): Each forward addition $O(mN M)$ , total $O(mN M^2)$ .
- Split search (naive): $O(m N^2)$ with OLS per split; optimized via QR-update for $O(N M^2)$ per split value.
- Tree recursion: Each node’s split costs $O(m N M^2)$ ; controls $n_\text{min}$ (minimum leaf size) and $\delta$ (split acceptance) regulate tree depth.
- Leafwise pruning: $O(M^3)$ per leaf in the worst case.
Key Hyperparameters
- $M$ (max terms per MARS fit): Typical value $M + 1 = 100$ .
- $D_\text{mars}$ (max interaction degree): 2 or 3.
- $\delta$ (split threshold): Default 1%.
- $n_\text{min}$ (minimum samples per leaf): $10 \times (M + 1)$ .
- Cross-validation: 5-fold for split-testing, 70/30 partitioning for candidate evaluation.

4. Comparative Empirical Evaluation

SMART has been empirically benchmarked against Random Forests (RF), Local Linear Forests (LLF), XGBoost, and standard MARS on a variety of simulated and synthetic regression datasets. The following summarizes root mean squared error (RMSE) findings (Pattie et al., 2024):

Dataset	RF	LLF	XGBoost	MARS	SMART
Friedman 1	6.57	5.27	3.45	4.24	4.16
Friedman 2	9.66	3.71	7.84	7.92	3.69
Friedman 3	3.44	1.15	1.06	0.81	0.81
Piecewise cubics	6.17	5.21	3.08	3.17	1.71

On Friedman 1 and 3, SMART tied standard MARS (no splits triggered), indicating smooth structure amenable to global spline models. On Friedman 2 and synthetic piecewise–cubic datasets, SMART detected discontinuities and delivered substantial error reductions (e.g., 3.69 vs 7.92 for MARS on Friedman 2; 1.71 vs 3.17 on cubics). In "Pruning Test" experiments with 20,000 points and a 4-feature splitting rule, SMART exactly recovered the correct tree and fitted close-to-ideal hinge functions in each leaf (RMSE 0.065, noise-free).

5. Relation to MARS and Theoretical Foundations

SMART’s backbone is the MARS algorithm, which is itself a greedy, forward-backward model selection procedure over a high-dimensional library of hinge-type basis functions. In MARS,

$h(x; t) = (x-t)_+$

and its multivariate extension considers products of such hinges across dimensions up to specified interaction degree (Ki et al., 2021). Recent results show that MARS basis expansion and fitting can also be recast as an infinite-dimensional lasso estimation problem (variation-norm penalized regression), which admits oracle inequalities and minimax rates of

$n^{-4/5} (\log n)^{c(s)}$

depending only logarithmically on interaction degree $s$ , thus partially circumventing the curse of dimensionality (Ki et al., 2021). SMART retains the favorable interpretability and variable selection features of MARS, while the recursive tree structure provides an explicit method for localizing abrupt regime shifts.

6. Implementation and Practical Considerations

Open Source Availability: The reference implementation (Python + Cython) is available at https://github.com/fyre87/SMART. The interface mirrors scikit-learn estimator conventions and includes QR-based split search for MARS fits, grid search utilities, and reproducible experiment scripts (Pattie et al., 2024).
Hyperparameter Selection: Empirically, $M + 1$ in $[50, 150]$ suits many tabular problems. $D_\text{mars} = 2$ is advised for pairwise effects, $3$ for strong higher-order interactions. $\delta = 1\%$ – $2\%$ helps prevent overfitting while capturing discontinuities.
Computation: The main bottleneck is frequent leafwise MARS fits during split evaluation. Efficient QR-update schemes (not naive OLS) are critical. Categorical/binary features admit batched OLS strategies. Feature-wise split search can be trivially parallelized.
Model Selection: Competing methods’ hyperparameters were grid-tuned via 5-fold cross-validation. SMART’s settings can likewise be fine-tuned, but defaults were robust in published experiments.

7. Research Context and Potential Extensions

SMART addresses the structural weakness of standard MARS (inability to isolate strong non-smooth or discontinuous effects) and of tree ensembles (bias when modeling continuous effects). The piecewise spline-tree approach isolates heterogeneity in smoothness, separating abrupt changes (modeled by tree splits) from intricate, higher-order, smooth structure (captured by leafwise MARS expansion). Since SMART’s leaves fit full MARS models, unlike typical model tree approaches that use simple polynomials or constants, interaction and nonlinear modeling capacity are not restricted by the tree topology. A plausible implication is that SMART may support extensions involving non-Gaussian regression, regularized leaf MARS (e.g., LASSO-MARS (Ki et al., 2021)), or smoothness-adaptive splitting rules.

Further study may compare SMART against alternative local-adaptive smoothing ensembles or explore scalable variations leveraging distributed computation for massive tabular datasets.

Markdown Report Issue Upgrade to Chat

References (2)

SMART: A Flexible Approach to Regression using Spline-Based Multivariate Adaptive Regression Trees (2024)

MARS via LASSO (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spline-based Multivariate Adaptive Regression Trees (SMART).