Papers
Topics
Authors
Recent
Search
2000 character limit reached

MCompiler Meta-Compilation Framework

Updated 17 February 2026
  • MCompiler is a meta-compilation framework that segments loop nests in C applications to evaluate and select the most efficient compiler for each segment.
  • It employs both exhaustive profiling and ML-based prediction to determine the optimal optimizer, balancing runtime performance and energy efficiency.
  • The framework uses precise speedup and energy metrics to validate its approach, achieving near-optimal performance with significantly reduced overhead.

MCompiler is a meta-compilation framework designed to synergistically combine the strengths of multiple compilers or optimizers by selecting, at the granularity of loop nests, the most profitable code generator for each segment and assembling them into a single executable. Its workflow, formal profitability definitions, use of machine learning for optimizer selection, and experimental results establish it as a versatile tool for both performance maximization and energy-aware compilation, with efficacy demonstrated across a suite of benchmarks (Shivam et al., 2019).

1. Architectural Overview and Workflow

MCompiler orchestrates the compilation process at the level of static for-loop nests ("loop nests") within C applications. The workflow follows these steps:

  • Segmentation: The input program is parsed to identify all static loop nests. Each is extracted as a standalone function ("1"), replacing the original call site in the "base file."
  • Multi-Compiler Optimization: For a user-supplied set of compilers/optimizers O={o1,,ok}O = \{o_1, \ldots, o_k\}—which can include Intel icc, GNU gcc, LLVM clang, Polly, Pluto, and others—each loop file (and the base file) is compiled by all ojOo_j \in O. This results in a set of candidate executables PjP_j.
  • Profiling Phase: Each candidate executable PjP_j is run on representative input data, and the execution time ts,jt_{s, j} of each loop nest ss under optimizer ojo_j is recorded.
  • Synthesis (Linking): For each loop nest, the system selects the optimizer os=argminojts,jo^*_s = \arg\min_{o_j} t_{s,j} that gives the lowest measured execution time and links the corresponding object files into the final executable. Optionally, the linking can target energy Es,jE_{s,j} or other metrics.
  • Machine Learning-Based Selection (optional): Instead of profiling every optimizer, loop features are extracted and a trained random-forest model predicts the most suitable optimizer for each segment, drastically reducing search time.

This architecture enables both runtime-based and ML-based selection modes, allowing for flexible trade-offs between compilation cost and code quality.

2. Formal Speedup and Profitability Metrics

Quantitative evaluation and optimizer selection in MCompiler are grounded in precise definitions:

  • Segment-Level Speedup:

Ss,j=tsbasets,jS_{s, j} = \frac{t^\text{base}_s}{t_{s, j}}

where tsbaset^\text{base}_s is baseline time for segment ss (e.g., compiled by icc).

  • Whole-Program Speedup:

SMC=TbaseTMCS_\text{MC} = \frac{T_\text{base}}{T_\text{MC}}

with TMC=s=1Nts,os+TrestT_\text{MC} = \sum_{s=1}^N t_{s, o^*_s} + T_\text{rest}, where TrestT_\text{rest} covers code not in tagged loop segments.

  • Geometric Mean Speedup across MM benchmarks:

SGM=(i=1MSi)1/MS_\text{GM} = \left(\prod_{i=1}^M S_i\right)^{1/M}

  • Performance Loss from ML vs. Profiling:

ΔML=TmlTprof1\Delta_\text{ML} = \frac{T_\text{ml}}{T_\text{prof}} - 1

where TmlT_\text{ml} and TprofT_\text{prof} are program times under ML-driven and profiling-driven optimizer selection.

These metrics consistently quantify gains for both per-segment and whole-program aggregation, and allow the ML-based path to be judged relative to exhaustive search.

3. Machine Learning Component for Optimizer Selection

The ML infrastructure in MCompiler centers on using hardware performance counters and feature-based classification to replace exhaustive profiling:

  • Feature Extraction: Loop files compiled at baseline (-O1) are profiled for a vector fs\mathbf{f}_s of hardware performance counters (e.g., instruction type mix, branch mispredictions, memory stalls, cache and TLB events), normalized to “per 1K instructions”.
  • Model Training: With profiling-derived "oracle" optimizer labels for training data (TSVC, Polybench, NAS-serial, OpenMP, etc.), a Random Forest (RF) classifier is trained. Hyperparameters include tree depth \leq 25, \geq 5 samples per leaf, up to 15 candidate splits per node, and feature subset size \approx 20.
  • Prediction: For new code segments, the trained model assigns the most suitable optimizer os=h(fs)o_s = h(\mathbf{f}_s), incurring O(logT)O(\log T) runtime for TT trees.

Cross-validation yields 90–95% per-class accuracy across serial and parallel models, and in deployment achieves empirical loss within 4% (serial) and 8% (auto-parallelized) relative to the profiling-based oracle.

4. Profiling vs. ML-Driven Selection: Algorithms and Overhead

Workflow for selecting the optimizer can be summarized pseudocode:

Profiling-Based Search (O(N×O)O(N \times |O|)):

1
2
3
4
5
6
for o in O:
    compile all loops with o
    run executable P_o, record t_{s, o}
for s in segments:
    o*_s = argmin_o t_{s, o}
link final binary with best o*_s for each segment
Overhead is proportional to number of optimizers O|O|.

ML-Based Selection (O(N)O(N)):

1
2
3
4
5
for s in segments:
    compile s with baseline -O1
    run once to get features f_s
    o_s = RF.predict(f_s)
compile with predicted o_s and link
Overhead is a single feature collection pass and cheap RF inference per segment, offering nearly $1/|O|$ reduction in end-to-end time.

Selection Mode Compile/Run Cost Empirical Perf. Loss (serial) Empirical Perf. Loss (parallel)
Profiling Oracle O×|O| \times (full) 0% 0%
ML Prediction 1×\sim 1 \times (RF) 3.6% 7.8%

The ML-based system thus approximates the profiling oracle with minimal overhead.

5. Empirical Results: Speedup and Compiler Diversity

Evaluated on a dual-socket Intel Xeon Gold 6142 (Skylake, 16 Cores/socket, AVX-512) across diverse benchmarks:

  • Serial (vectorized): geometric mean speedup =1.96×= 1.96\times (up to 2.5×2.5\times).
  • Auto-parallelized (32 threads): 2.62×2.62\times geometric mean (Polybench, NPB-SER).
  • OpenMP (multithreaded, NAS/OpenMP): 1.04×1.04\times (mean), up to 1.74×1.74\times.

Typical optimizer choices (serial): icc (51%), clang (25%), Polly (18%), gcc (6%); parallel: Polly (64%), icc (36%).

The ML-based chooser achieves these speedups within 4–8% of the profiling-based optimum, confirming the viability of counter-based optimizer prediction.

6. Extension to Energy-Aware Optimization

MCompiler is extensible beyond runtime to energy and energy-delay-product (EDP) objectives:

  • LIKWID Instrumentation: Each loop segment wraps markers (LIKWID_MARKER_START/STOP). Package and DRAM energy is sampled via Intel RAPL during execution.
  • Energy-Driven Synthesis: Instead of minimizing ts,jt_{s, j}, the objective shifts to Es,jE_{s,j}, or to EDP =sEsts=\sum_s E_s t_s, or average power Ps=Es/tsP_s = E_s / t_s.
  • Flexible Objective: Synthesis can select for energy, latency, or any analytic combination, and may trade off among multiple metrics if desired.

This extension demonstrates the framework’s suitability for energy-critical domains such as embedded and high-performance computing.

7. Research Significance, Limitations, and Modular Design

MCompiler systematically enables:

  • Segment-level optimizer diversity and exploitation of non-uniform profitability across program structure.
  • Substantial speedup/energy gains relative to monolithic compilation choices for both serial and parallel code.
  • Rapid, low-overhead ML-based synthesis, enabling practical use in research and tool evaluation pipelines.
  • Ease of extension—additional metrics, new optimizers, and profiling sources can be integrated by modular design.

However, MCompiler currently focuses on loop-nest granularity; coarser or finer segmentation may require further research. It leverages the ROSE source-to-source infrastructure and standard toolchains, so portability is bounded by ROSE and LIKWID/RAPL support.

In summary, MCompiler provides a meta-compilation infrastructure combining empirical search, ML-driven prediction, and multi-objective optimization to produce binaries outperforming any single compiler strategy for loop-intensive workloads, validated across multiple metrics and benchmarks (Shivam et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MCompiler Framework.