MorphBoost: Adaptive Gradient Boosting
- MorphBoost is a self-organizing gradient boosting framework that dynamically adjusts tree splits using evolving gradient distributions and problem fingerprinting.
- The framework blends gradient-based gains with information-theoretic metrics to optimize split criteria, resulting in improved accuracy and computational efficiency in various supervised tasks.
- Key features include vectorized prediction, interaction-aware feature importance, and a fast-mode option, which collectively boost performance and robustness.
MorphBoost is a self-organizing gradient boosting framework that introduces adaptive tree morphing to improve flexibility, robustness, and computational efficiency in supervised learning tasks. Unlike traditional gradient boosting implementations such as XGBoost and LightGBM, which are characterized by fixed-split static trees and unchanging split criteria throughout training, MorphBoost employs tree structures that dynamically adjust their splitting behavior based on evolving gradient distributions and the stage of the learning process. This adaptivity is guided by accumulated gradient statistics, information-theoretic metrics, and an automatic fingerprinting of the data complexity, enabling responsive parameter tuning across binary classification, multiclass, and regression tasks. MorphBoost is further distinguished by vectorized prediction, interaction-aware feature importance metrics, and a tunable fast-mode for computational efficiency (Kriuk, 17 Nov 2025).
1. Algorithmic Structure and Workflow
MorphBoost constructs an additive ensemble of trees in a sequential (stage-wise) manner. At each boosting round , it computes sample-wise gradients and Hessians (or, for multiclass, their per-class analogues) based on the current model and loss. The critical workflow divergence from canonical boosting techniques arises during tree construction: rather than employing static split evaluation, MorphBoost interpolates between pure gradient-based gain and a time-evolving information-theoretic component as learning advances.
Prior to training, a problem fingerprint is computed to determine task type (binary, multiclass, regression), appropriate depth limits, regularization schedules, and evolution pressure parameters. Post-training, model inference leverages a breadth-first, vectorized algorithm, enabling efficient traversal of trees for large batches of data. Feature importance is scored with respect not only to marginal split gains but also to detected multiplicative interactions. An optional fast mode can be invoked for reduced compute cost by trading off some adaptivity and accuracy.
2. Morphing Split Criterion
Traditional boosting algorithms typically use split scores based on gradient gain: where and are the gradient and Hessian of the loss with respect to instance , and is a regularization hyperparameter.
MorphBoost augments this with a "morphing" criterion that depends on training iteration and the statistical distribution of the gradients. The running mean and standard deviation of gradients across iterations are updated as
with . Each gradient is then normalized to yield
An information-theoretic component is defined as
where is an evolution pressure parameter and is the total number of boosting rounds.
The final morphing split score at iteration is
For the initial rounds (), only the gradient gain component is used to accelerate early fitting.
In multiclass tasks, the algorithm decomposes the objective into one-versus-rest subproblems. For each instance and class :
The morphing criterion is then applied per class channel.
3. Automatic Problem Fingerprinting
At initialization, MorphBoost extracts a fingerprint vector from the dataset to inform downstream hyperparameters:
- Complexity:
- Non-linearity score: Computed as the ratio of linear to quadratic regression fits on random subsamples.
- Interaction strength: Estimated by multiplicative correlations among random pairs of features using small subsamples.
- Task type: Determined by analyzing the ratio of unique target values to the number of samples, with regression selected if or , otherwise multiclass or binary.
The fingerprint directly configures the maximum tree depth (up to 10 in standard mode), regularization strategy, and the evolution pressure parameter . This mechanism grants MorphBoost the capability for intelligent, instance- and dataset-adaptive parameter configuration.
4. Vectorized Prediction Algorithm
Prediction in MorphBoost is achieved via breadth-first, batched tree traversal. Each tree is traversed level-wise using Boolean masks corresponding to the row indices in the data matrix, tracking which samples reside at which nodes. For each split, the algorithm computes new Boolean masks for all samples at a node, applying the left/right split simultaneously across the batch through vectorized operations. The pseudocode structure is:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Initialize queue Q with root node
mask[root] ← all-True (size N)
while Q not empty:
node ← Q.pop()
if node is leaf:
output[node] += leaf_value
else:
feat, thresh ← node.split
left_mask = mask[node] & (X[:,feat] ≤ thresh)
right_mask = mask[node] & (X[:,feat] > thresh)
mask[left_child] = left_mask
mask[right_child] = right_mask
Q.push(left_child); Q.push(right_child) |
This yields a per-tree computational cost of , a notable improvement over the complexity in conventional tree prediction. Empirical benchmarks report approximately speedup in Python/NumPy environments.
5. Interaction-Aware Feature Importance
MorphBoost computes post hoc feature importance by aggregating, across all split nodes, a weighted product of each node's morph score and realized gain: where is the feature used at split node . If a node was identified as supporting a multiplicative interaction at training, its importance receives a primary-feature bonus. A geometric decay of $0.9$ per depth level attenuates the influence of deeper splits. Final importance scores are normalized to sum to unity.
This interaction-aware scoring scheme increases the sensitivity of MorphBoost to non-additive feature effects, which traditional marginal-gain metrics may overlook.
6. Fast-Mode Optimization
For applications prioritizing inference and training throughput over maximal adaptivity, MorphBoost includes a fast mode that reduces computational overhead by replacing fingerprint and exhaustive threshold search with fixed heuristics:
- NonLinearity: set to $0.2$
- InteractionScore: set to $0.15$
- NoiseLevel: set to $0.1$
- Tree depth: limited to 8 (rather than up to 10 adaptively)
- Threshold sampling: if unique feature values exceed 64, 16 quantiles are sampled; if exceeding 256, 32 quantiles; otherwise, all midpoints are examined.
Fast mode typically achieves a 50–70% reduction in fingerprinting and split-finding computation, at the cost of a – reduction in accuracy.
7. Empirical Benchmarking and Performance
MorphBoost was benchmarked on 10 datasets spanning binary, multiclass, and regression tasks of varying difficulty and size. Against established baselines (XGBoost, GradientBoosting, HistGradientBoosting), MorphBoost achieved a mean accuracy of $0.9009$, surpassing XGBoost's $0.8934$ by (statistically significant at ). MorphBoost also exhibited the lowest variance () and highest minimum accuracy across all tested models.
| Model | Mean Acc. | Win Rate | |
|---|---|---|---|
| MorphBoost | 0.9009 | 40% | 0.0948 |
| GradientBoosting | 0.8959 | 20% | 0.1012 |
| HistGradientBoosting | 0.8952 | 10% | 0.1087 |
| XGBoost | 0.8934 | 0% | 0.1234 |
On the hardest datasets (including high-dimensional noise and severely imbalanced multiclass problems), MorphBoost outperformed XGBoost by over 4% relative accuracy and achieved an order-of-magnitude reduction in output variance. The framework delivered first-place performance on 40% of datasets (4/10 wins) and top-3 finishes in 20% of all positions (6/30), indicating superior robustness and consistency, especially in challenging learning scenarios.
References
- "MorphBoost: Self-Organizing Universal Gradient Boosting with Adaptive Tree Morphing" (Kriuk, 17 Nov 2025)