MCompiler Meta-Compilation Framework

Updated 17 February 2026

MCompiler is a meta-compilation framework that segments loop nests in C applications to evaluate and select the most efficient compiler for each segment.
It employs both exhaustive profiling and ML-based prediction to determine the optimal optimizer, balancing runtime performance and energy efficiency.
The framework uses precise speedup and energy metrics to validate its approach, achieving near-optimal performance with significantly reduced overhead.

MCompiler is a meta-compilation framework designed to synergistically combine the strengths of multiple compilers or optimizers by selecting, at the granularity of loop nests, the most profitable code generator for each segment and assembling them into a single executable. Its workflow, formal profitability definitions, use of machine learning for optimizer selection, and experimental results establish it as a versatile tool for both performance maximization and energy-aware compilation, with efficacy demonstrated across a suite of benchmarks (Shivam et al., 2019).

1. Architectural Overview and Workflow

MCompiler orchestrates the compilation process at the level of static for-loop nests ("loop nests") within C applications. The workflow follows these steps:

Segmentation: The input program is parsed to identify all static loop nests. Each is extracted as a standalone function ("^{^{^{^{1^{^{^{^"),}}}}}}} replacing the original call site in the "base file."
Multi-Compiler Optimization: For a user-supplied set of compilers/optimizers $O = \{o_1, \ldots, o_k\}$ —which can include Intel icc, GNU gcc, LLVM clang, Polly, Pluto, and others—each loop file (and the base file) is compiled by all $o_j \in O$ . This results in a set of candidate executables $P_j$ .
Profiling Phase: Each candidate executable $P_j$ is run on representative input data, and the execution time $t_{s, j}$ of each loop nest $s$ under optimizer $o_j$ is recorded.
Synthesis (Linking): For each loop nest, the system selects the optimizer $o^*_s = \arg\min_{o_j} t_{s,j}$ that gives the lowest measured execution time and links the corresponding object files into the final executable. Optionally, the linking can target energy $E_{s,j}$ or other metrics.
Machine Learning-Based Selection (optional): Instead of profiling every optimizer, loop features are extracted and a trained random-forest model predicts the most suitable optimizer for each segment, drastically reducing search time.

This architecture enables both runtime-based and ML-based selection modes, allowing for flexible trade-offs between compilation cost and code quality.

2. Formal Speedup and Profitability Metrics

Quantitative evaluation and optimizer selection in MCompiler are grounded in precise definitions:

Segment-Level Speedup:

$S_{s, j} = \frac{t^\text{base}_s}{t_{s, j}}$

where $t^\text{base}_s$ is baseline time for segment $s$ (e.g., compiled by icc).

Whole-Program Speedup:

$S_\text{MC} = \frac{T_\text{base}}{T_\text{MC}}$

with $T_\text{MC} = \sum_{s=1}^N t_{s, o^*_s} + T_\text{rest}$ , where $T_\text{rest}$ covers code not in tagged loop segments.

Geometric Mean Speedup across $M$ benchmarks:

$S_\text{GM} = \left(\prod_{i=1}^M S_i\right)^{1/M}$

Performance Loss from ML vs. Profiling:

$\Delta_\text{ML} = \frac{T_\text{ml}}{T_\text{prof}} - 1$

where $T_\text{ml}$ and $T_\text{prof}$ are program times under ML-driven and profiling-driven optimizer selection.

These metrics consistently quantify gains for both per-segment and whole-program aggregation, and allow the ML-based path to be judged relative to exhaustive search.

3. Machine Learning Component for Optimizer Selection

The ML infrastructure in MCompiler centers on using hardware performance counters and feature-based classification to replace exhaustive profiling:

Feature Extraction: Loop files compiled at baseline (-O1) are profiled for a vector $\mathbf{f}_s$ of hardware performance counters (e.g., instruction type mix, branch mispredictions, memory stalls, cache and TLB events), normalized to “per 1K instructions”.
Model Training: With profiling-derived "oracle" optimizer labels for training data (TSVC, Polybench, NAS-serial, OpenMP, etc.), a Random Forest (RF) classifier is trained. Hyperparameters include tree depth $\leq$ 25, $\geq$ 5 samples per leaf, up to 15 candidate splits per node, and feature subset size $\approx$ 20.
Prediction: For new code segments, the trained model assigns the most suitable optimizer $o_s = h(\mathbf{f}_s)$ , incurring $O(\log T)$ runtime for $T$ trees.

Cross-validation yields 90–95% per-class accuracy across serial and parallel models, and in deployment achieves empirical loss within 4% (serial) and 8% (auto-parallelized) relative to the profiling-based oracle.

4. Profiling vs. ML-Driven Selection: Algorithms and Overhead

Workflow for selecting the optimizer can be summarized pseudocode:

Profiling-Based Search ( $O(N \times |O|)$ ):

for o in O:
    compile all loops with o
    run executable P_o, record t_{s, o}
for s in segments:
    o*_s = argmin_o t_{s, o}
link final binary with best o*_s for each segment

Overhead is proportional to number of optimizers

|O|

ML-Based Selection ( $O(N)$ ):

for s in segments:
    compile s with baseline -O1
    run once to get features f_s
    o_s = RF.predict(f_s)
compile with predicted o_s and link

Overhead is a single feature collection pass and cheap RF inference per segment, offering nearly $1/|O|$ reduction in end-to-end time.

Selection Mode	Compile/Run Cost	Empirical Perf. Loss (serial)	Empirical Perf. Loss (parallel)
Profiling Oracle	$\|O\| \times$ (full)	0%	0%
ML Prediction	$\sim 1 \times$ (RF)	3.6%	7.8%

The ML-based system thus approximates the profiling oracle with minimal overhead.

5. Empirical Results: Speedup and Compiler Diversity

Evaluated on a dual-socket Intel Xeon Gold 6142 (Skylake, 16 Cores/socket, AVX-512) across diverse benchmarks:

Serial (vectorized): geometric mean speedup $= 1.96\times$ (up to $2.5\times$ ).
Auto-parallelized (32 threads): $2.62\times$ geometric mean (Polybench, NPB-SER).
OpenMP (multithreaded, NAS/OpenMP): $1.04\times$ (mean), up to $1.74\times$ .

Typical optimizer choices (serial): icc (51%), clang (25%), Polly (18%), gcc (6%); parallel: Polly (64%), icc (36%).

The ML-based chooser achieves these speedups within 4–8% of the profiling-based optimum, confirming the viability of counter-based optimizer prediction.

6. Extension to Energy-Aware Optimization

MCompiler is extensible beyond runtime to energy and energy-delay-product (EDP) objectives:

LIKWID Instrumentation: Each loop segment wraps markers (LIKWID_MARKER_START/STOP). Package and DRAM energy is sampled via Intel RAPL during execution.
Energy-Driven Synthesis: Instead of minimizing $t_{s, j}$ , the objective shifts to $E_{s,j}$ , or to EDP $=\sum_s E_s t_s$ , or average power $P_s = E_s / t_s$ .
Flexible Objective: Synthesis can select for energy, latency, or any analytic combination, and may trade off among multiple metrics if desired.

This extension demonstrates the framework’s suitability for energy-critical domains such as embedded and high-performance computing.

7. Research Significance, Limitations, and Modular Design

MCompiler systematically enables:

Segment-level optimizer diversity and exploitation of non-uniform profitability across program structure.
Substantial speedup/energy gains relative to monolithic compilation choices for both serial and parallel code.
Rapid, low-overhead ML-based synthesis, enabling practical use in research and tool evaluation pipelines.
Ease of extension—additional metrics, new optimizers, and profiling sources can be integrated by modular design.

However, MCompiler currently focuses on loop-nest granularity; coarser or finer segmentation may require further research. It leverages the ROSE source-to-source infrastructure and standard toolchains, so portability is bounded by ROSE and LIKWID/RAPL support.

In summary, MCompiler provides a meta-compilation infrastructure combining empirical search, ML-driven prediction, and multi-objective optimization to produce binaries outperforming any single compiler strategy for loop-intensive workloads, validated across multiple metrics and benchmarks (Shivam et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

MCompiler: A Synergistic Compilation Framework (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MCompiler Framework.

MCompiler Meta-Compilation Framework

1. Architectural Overview and Workflow

2. Formal Speedup and Profitability Metrics

3. Machine Learning Component for Optimizer Selection

4. Profiling vs. ML-Driven Selection: Algorithms and Overhead

5. Empirical Results: Speedup and Compiler Diversity

6. Extension to Energy-Aware Optimization

7. Research Significance, Limitations, and Modular Design

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MCompiler Meta-Compilation Framework

1. Architectural Overview and Workflow

2. Formal Speedup and Profitability Metrics

3. Machine Learning Component for Optimizer Selection

4. Profiling vs. ML-Driven Selection: Algorithms and Overhead

5. Empirical Results: Speedup and Compiler Diversity

6. Extension to Energy-Aware Optimization

7. Research Significance, Limitations, and Modular Design

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research