Autostacker: Compositional AutoML System

Updated 24 January 2026

Autostacker is a compositional AutoML system that automatically constructs machine learning pipelines using hierarchical stacking and evolutionary optimization.
It employs a cascading architecture where raw features and intermediate predictions combine to refine model outputs layer-wise.
Benchmark evaluations on 15 datasets reveal superior accuracy and up to sixfold faster performance compared to traditional AutoML methods.

Autostacker is a compositional automatic machine learning (AutoML) system that uses a hierarchical stacking architecture and evolutionary algorithms to automatically discover, optimize, and compose machine learning pipelines. It is designed to require no prior domain knowledge or feature preprocessing, and it efficiently explores a large space of model compositions and hyperparameterizations, producing high-accuracy solutions for supervised classification tasks (Chen et al., 2018).

1. Hierarchical Stacking Architecture

Autostacker constructs learning pipelines in multiple discrete stacking layers. The input to the system is the raw feature matrix $X_0$ . At each layer $i$ $(0 \le i < I)$ , there are $N_i$ primitive learners (nodes) $f_{i, j}$ , each producing a prediction vector $Y_{i, j} = f_{i, j}(X_i)$ . The outputs of all nodes in layer $i$ are concatenated to the inputs, forming the feature set for the next layer: $X_{i+1} = X_i \cup \{Y_{i,0},\ldots, Y_{i,N_i-1}\}$ . The system enforces "cascading," i.e., always including $X_0$ at every layer, ensuring ancestral information retention. The final output is produced by a single node in the top layer, consolidating previous learnings through feature augmentation at each stage.

The stacking structure allows each upper layer to learn corrections for errors made by earlier layers. Intermediate representations (model outputs) become part of the feature set for subsequent models, enabling complex, composite prediction functions that are not limited to ensembles of a single-type learner.

2. Pipeline Encoding, Search Space, and Genome

Each candidate Autostacker pipeline is represented as a hyperparameter vector (genome)

$\boldsymbol{h} = \bigl(I,\ \{N_i\}_{i=0}^{I-1},\,\{t_{i, j}\},\,\{\boldsymbol{\theta}_{i, j}\}\bigr)$

where $I$ is the number of layers (default $1 \le I \le 5$ ), $N_i$ is the number of nodes in layer $i$ (default $1 \le N_i \le 3$ ), $t_{i,j}$ is the model primitive type at node $(i, j)$ , and $\boldsymbol{\theta}_{i,j}$ is its hyperparameter vector. Supported primitive types ( $\mathcal{T}$ ) include a range of commonly used classifiers, for example: Perceptron, LogisticRegression, SVC, DecisionTree, KNeighbors, RandomForest, Bagging, AdaBoost, ExtraTrees, GradientBoosting, XGBClassifier, MLPClassifier, BernoulliNB, and MultinomialNB. Hyperparameters are sampled from their standard ranges as in scikit-learn or XGBoost.

3. Evolutionary Algorithm and Pipeline Search

Autostacker uses a population-based evolutionary algorithm (EA) to optimize model type, hyperparameters, and overall pipeline architecture.

Population Initialization: The initial population consists of $N=200$ randomly sampled pipelines.
Fitness Evaluation: Each candidate pipeline is evaluated by $k$ -fold cross-validation; the fitness function is the mean validation accuracy.
Selection: At each generation, a candidate pool of size $2N$ is formed. Top $N$ performers (by fitness) are kept.
Variation Operators:
- Mutation: Applied to half the population; randomly mutates one gene in the hyperparameter vector (e.g., changes model type, a hyperparameter, layer/node count).
- Crossover: Applied to the other half; for each pair of pipelines, uses a random cut-point $\ell$ to exchange layer substructures and form two offspring.
Termination: The algorithm runs for $M=10$ generations or until a preset computational budget. The ten best pipelines by validation accuracy are returned.

This EA allows the system to rapidly search a large combinatorial space of compositions and parameterizations without explicit complexity regularization, and without discarding atypical or unconventional model stackings.

4. Training and Model Composition

Within a single pipeline, each layer is trained in a stage-wise fashion. At layer $i$ , all primitive learners are trained independently using the current feature matrix, formed by raw and all prior predictions: $X_{i} = X_{i-1} \cup \{Y_{i-1,0},\dots,Y_{i-1,N_{i-1}-1}\}$ where $Y_{i-1,j} = f_{i-1,j}(X_{i-1})$ . Final prediction is given by the output of the top-layer single node: $\hat{Y} = f_{I-1,0}(X_{I-1})$ This structure enables the correction of prior errors layer by layer, as the topology can represent both deep (many layer) and broad (multiple node) ensembles incorporating heterogeneous learners.

5. Benchmark Evaluation and Empirical Results

Autostacker was evaluated using 15 classification datasets from the PMLB repository, featuring task diversity (binary and multi-class) and wide variation in instance size. No feature engineering or preprocessing was used; the system operated directly on raw feature matrices. Standard baselines included RandomForestClassifier (500 trees), TPOT (EA-based AutoML), and AutoSklearn (Bayesian optimization plus ensemble selection).

Outcomes were measured as balanced test accuracy and wall-clock time to final pipeline production. Autostacker outperformed RandomForest on all datasets, exceeded TPOT accuracy in 12 of 15 tasks, and bested AutoSklearn in 9 of 15. Speed analysis showed Autostacker up to six times faster than TPOT on large datasets; it was also faster than AutoSklearn in most cases. For each dataset, Autostacker provided ten high-quality pipeline candidates suitable for subsequent expert selection or refinement (Chen et al., 2018).

6. Strengths, Limitations, and Future Directions

Autostacker’s primary strengths are its fully compositional stacking mechanism—allowing the discovery of novel model combinations without domain knowledge or preprocessing—and its ease of parallelization (each pipeline is independent). The system achieves competitive or state-of-the-art accuracy across diverse tasks with manageable computational resources.

Key limitations include the primitive set’s restriction to classical learners, excluding deep neural networks and thereby reducing effectiveness for high-dimensional unstructured data (e.g., images, text). The search strategy is a “vanilla” evolutionary algorithm; potentially, more sophisticated variants (e.g., adaptive mutation rates, multi-objective optimization) could yield faster convergence or higher-quality pipelines. A lack of explicit penalties for pipeline complexity may yield unnecessarily large architectures.

Future work suggested includes integrating advanced model primitives (e.g., convolutional networks, transformer architectures), enriching the search space to encompass feature engineering and data preprocessing, and hybridizing evolutionary optimization with Bayesian and other metaheuristics. There is also an open question regarding the theoretical underpinnings of emergent stacking patterns in evolved pipelines (Chen et al., 2018).

Markdown Report Issue Upgrade to Chat

References (1)

Autostacker: A Compositional Evolutionary Learning System (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Autostacker.