RocketStack: Deep Recursive Ensemble
- RocketStack is a deep recursive ensemble learning framework that extends classical stacking to up to 10 levels with adaptive feature fusion and compression.
- It leverages dynamic, performance-based pruning with controlled randomization to balance model diversity and simplify complexity.
- Empirical results across 33 datasets show significant accuracy gains and runtime improvements, underscoring its scalability and practical effectiveness.
RocketStack is a level-aware deep recursive ensemble learning framework that systematically addresses the limitations associated with classical stacking by extending the recursion depth, controlling complexity, and adaptively compressing features and model pools. Inspired by the multi-stage separation and propulsion analogy in aerospace ("prune, compress, propel"), RocketStack enables scalable deep ensembling to depths of up to ten recursive stacking levels while containing the model and feature proliferation that typically impedes such methods. Empirical evidence across diverse domains demonstrates that RocketStack's monotonic multi-level extensions deliver both substantial accuracy gains and significant efficiency improvements compared to both classical and state-of-the-art single-level ensembles (Demirel, 20 Jun 2025).
1. Recursive Deep Stacking Architecture
RocketStack extends stacking to depth (up to 10), departing from the common practice of single-level or shallow stacking. The procedure is defined as follows:
- At level 0, an initial pool of base learners is trained on features .
- For each level :
- Out-of-fold (OOF) probability vectors are generated for each model in using 5-fold cross-validation.
- Feature fusion is performed by concatenating meta-predictions and preceding features: .
- Optional feature compression is applied according to the selected scheme.
- Models from are retrained on and evaluated on held-out folds.
- Weak learners are pruned to create for progression.
- The resulting features are aggregated into a global “stack-of-stacking” set.
After level , all surviving models are re-trained on the concatenated meta-features across all levels to produce the final ensemble prediction.
The framework is modular, parameterizing both the compression and pruning schedule. The “prune, compress, propel” operational metaphor encapsulates how RocketStack manages recursive complexity, controls feature and model set growth, and enables deeper meta-ensembling layers without exponential cost escalation (Demirel, 20 Jun 2025).
2. Pruning Dynamics and Model Diversity
Model pool curation at each level is implemented via dynamic, performance-informed pruning:
- Strict OOF-based pruning computes individual model performance scores (ROC-AUC for binary, accuracy for multiclass) and retains models satisfying
where the percentile adapts to performance spread.
- Gaussian noise randomization injects controlled diversity by perturbing OOF scores prior to pruning:
Using for thresholding allows otherwise marginal models to survive and contribute ensemble diversity.
Strict pruning is fully deterministic but risks overfitting to a small subset of strong early models. Mild randomization () functions as a regularizer by stochastically retaining potentially valuable but underperforming learners, analogous to Dropout, and empirically raises late-level performance. Stronger noise () further increases diversity, albeit with the risk of higher outcome variance. These dynamics support RocketStack’s claim to balance recursive depth and diversity in the model pool (Demirel, 20 Jun 2025).
3. Feature Fusion, Compression, and Dimensionality Control
Each stacking level appends meta-features:
- In binary classification, .
- In multi-class classification, .
RocketStack offers adaptive feature compression strategies:
- Simple, Fast, Efficient (SFE) filter: Utility of feature is scored as
and features are greedily selected based on utility.
- Autoencoder (AE) compression (multiclass): An autoencoder , projects to bottleneck dimension and is trained to minimize reconstruction loss:
Variants include 2-layer () and 3-layer () architectures.
- Attention-based selection (multiclass): Attention weights are computed via
and features exceeding the percentile are retained:
Compression is applied in two regimes: per-level (after every stacking iteration) or periodic (at preset levels, e.g., ). Compression reduces linear growth in feature and model space, transforms feature representations, and substantially reduces runtimes, especially in periodic schedules (Demirel, 20 Jun 2025).
4. Empirical Performance and Benchmarking
Experiments on 33 datasets (23 binary, 10 multiclass) from domains including finance, healthcare, software defects, speech, and handwriting evaluate RocketStack using accuracy and runtime as central metrics.
- Accuracy trends: Linear mixed-model hypothesis tests confirm highly significant positive accuracy-depth trends () for no-compression, periodic SFE (binary), and periodic attention (multiclass). Per-level compression, particularly with SFE or AE, does not generate consistent trends.
- Best observed configurations at level 10:
- Binary: Periodic SFE with mild randomization () achieves 97.08% accuracy—5.14 percentage points above strict pruning—and cuts runtime by 10.5% versus no compression.
- Multiclass: Periodic attention with moderate randomization delivers 98.60% (6.11 points above best baseline), reducing runtime by 56.1% and feature dimensionality by 74% (145 to 38 features).
| Setting | Accuracy at L=10 | Runtime Gain | Feature Reduction |
|---|---|---|---|
| Binary, Periodic SFE + | 97.08% | -10.5% | 1777-36 |
| Multiclass, Periodic Attn + | 98.60% | -56.1% | 14538 |
RocketStack’s top variants (except per-level SFE in multiclass) outperformed the strongest single-level ensembles (e.g., XGBoost, LightGBM) by clear monotonic margins, confirming the efficacy of recursive depth and dynamic compression (Demirel, 20 Jun 2025).
5. Mathematical and Algorithmic Formalism
Key mathematical definitions include:
- Pruning threshold:
- Autoencoder loss:
- Model and feature scaling:
- Naive stacking complexity is for models, levels, training cost per model, and feature dimensionality (for classes).
- RocketStack reduces these to approximately , where , and both and feature growth are curbed by periodic compressions and pruning. Empirically, feature and runtime normalization (0–1 scale) demonstrate sublinear scaling due to prune-compress operations.
6. Significance and Implications
RocketStack demonstrates that genuinely deep recursive ensemble learning is feasible and beneficial when guided by adaptive pruning and compression. Its operational analog to multistage rocketry is reflected in the phase-wise culling of weak models (“prune”), dimensionality control (“compress”), and retention of the strongest combinations at each level (“propel”). Mild, carefully calibrated randomization in pruning is validated as an effective regularizer, comparable to Dropout for deep neural networks. The framework’s empirical monotonic accuracy gains, pronounced improvements in runtime and model footprint, and modular design position it as both a practical and theoretically motivated advance in ensemble methodology (Demirel, 20 Jun 2025).