RocketStack: Deep Recursive Ensemble
- RocketStack is a deep recursive ensemble learning framework that extends stacking architectures with up to ten layers, integrating predictions through adaptive model pruning and feature compression.
- The framework leverages recursive stacking, noise-perturbed pruning, and periodic feature compression methods (such as SFE, autoencoder, and attention-based selection) to control computational and feature complexity.
- Empirical results on binary and multi-class datasets show significant accuracy improvements (up to 6.11% gains) and substantial runtime and feature dimensionality reductions compared to traditional stacking methods.
RocketStack is a level-aware deep recursive ensemble learning framework designed to extend the depth of stacking architectures while controlling computational and feature complexity through adaptive model pruning, feature compression, and stochastic regularization. The methodology systematically advances beyond conventional horizontal diversity in ensemble learning by enabling recursive stacking up to ten levels, thus promoting deeper representational integration across base learners with tractable computational costs (Demirel, 20 Jun 2025).
1. Recursive Stacking Architecture
RocketStack generalizes traditional stacking by constructing a hierarchy of ensemble layers, each integrating predictions from the preceding level through meta-feature concatenation and selective pruning. Let represent the original -sample, -feature training set, with as its hold-out counterpart. At stacking level , the ensemble consists of , where is the number of models retained post-pruning from the previous level.
Each model undergoes -fold cross-validation; its concatenated out-of-fold (OOF) predictions form . Aggregating predictions across all models yields . The iterative meta-feature expansion is defined as:
where denotes column-concatenation and the vector of hold-out predictions.
Model pruning is performed at each level to ensure . Raw OOF performance scores (accuracy or AUC) are computed for each , and a custom threshold is defined as the quantile at , where may be the raw or noise-perturbed score. Only models meeting are retained.
2. Pruning Strategies and Feature Compression Mechanisms
A key innovation in RocketStack is the introduction of noise-perturbed pruning and adaptive feature compression, implemented as follows:
Noise-perturbed Pruning
Mild Gaussian noise is added to OOF scores prior to pruning to serve as a regularizer:
where . Strict () and randomized () schemes are compared.
Feature Compression
Feature dimensionality is controlled either at every level or periodically (e.g., levels 3, 6, 9), using one of three compressors:
- Simple, Fast, Efficient (SFE) Filter: Utility ; select features with highest utility.
- Autoencoder (AE) Compression: Nonlinear reduction using minimize with bottleneck .
- Attention-Based Selection: Compute ; keep with .
A simplified pseudocode of the framework orchestrates the OOF generation, optional feature compression, model evaluation, noise injection, and dynamic pruning per level, with user-specified settings for stacking depth , cross-validation folds , pruning noise , compression mode, compressor type, periodicity, and minimum model count.
3. Model Training, Meta-Learner Pooling, and Computational Complexity
At each level, retained base learners are re-trained on the augmented feature matrix , recursively constructing deeper meta-representations. Rather than a single fixed meta-learner, the ensemble at each level comprises all surviving models , with optional selection of the top- or the singular top performer for inference.
The computational complexity of each level is dominated by for cross-validated training, for filter-based compression (or for autoencoders), and for pruning. Sublinear runtime growth with increasing is achieved through aggressive pruning and feature reduction, supporting practical exploration to depths of .
4. Empirical Evaluation across Binary and Multi-Class Datasets
Experiments across 33 OpenML datasets (23 binary, 10 multi-class) demonstrate the efficacy and scalability of RocketStack:
Binary Classification (Periodic SFE at Levels 3/6/9)
- Strict pruning (): 88.08% accuracy at level 10
- Light randomization (): 88.40% (+0.32%)
- Runtime reduction: 10.5% compared to no compression
- Feature count at L10: 6 (vs. 177 with no compression)
Multi-Class Classification (Periodic Attention)
- Strict pruning: 93.29%
- Light randomization: 93.67% (+0.38%)
- Ultimate accuracy at L10: 98.60% (vs. 92.49% best baseline; +6.11%)
- Runtime reduction: 56.1% relative to no compression
- Feature reduction at L10: From 145 to 38 (74%)
Linear mixed model analysis indicates significant accuracy increases with stacking depth in most configurations (). Periodic compression schemes yield the strongest trends (), while per-level compression often lacks a significant trend ().
5. Staged Ensemble Dynamics: The Rocket Analogy
RocketStack is conceptualized around the metaphor of multistage rocket engineering, encapsulated as “Prune – Compress – Propel”:
- Prune: Analogous to jettisoning empty fuel tanks, underperforming learners are removed to prevent superfluous complexity.
- Compress: Periodic feature compression parallels stage separation, allowing informative meta-features to accumulate before redundancy is discarded.
- Propel: Mild Gaussian randomization in pruning induces a controlled instability, analogous to guidance feedback in rocket dynamics, promoting diversity and mitigating the risk of premature convergence.
These coordinated mechanisms facilitate deep recursive ensembling with sustainable complexity, enabling superior predictive performance relative to shallower and horizontally-diverse stacking architectures.
6. Significance and Implementation Considerations
RocketStack establishes a scalable paradigm for deep ensemble integration, demonstrating that controlled regularization and staged dimensionality reduction can overcome saturation and complexity barriers that previously limited the depth of stack-based learning. Its modular design accommodates advances in feature compression, meta-learner architectures, and adaptive pruning for continued empirical and theoretical exploration (Demirel, 20 Jun 2025). The detailed pseudocode and equation definitions provided in the original manuscript enable rigorous reimplementation and comparative benchmarking.