Papers
Topics
Authors
Recent
Search
2000 character limit reached

MLIR-Smith: Random Program Generator

Updated 12 January 2026
  • MLIR-Smith is a grammar-guided random program generator for MLIR that creates valid, diverse modules to test MLIR-based compiler optimizations.
  • It features a modular architecture with a trait-based GeneratableOpInterface to easily support user-defined, extensible dialects.
  • MLIR-Smith supports differential testing across multiple compiler pipelines and has detected critical functional bugs and optimization gaps.

MLIR-Smith is a grammar-guided random program generator tailored for the Multi-Level Intermediate Representation (MLIR) ecosystem, designed to enable rigorous testing and evaluation of MLIR-based compiler optimizations. Unlike prior random program generation tools such as Csmith, MLIR-Smith explicitly addresses the challenges posed by MLIR's user-extensible dialects and the lack of a fixed grammar, providing a dialect-agnostic mechanism to generate valid, diverse MLIR modules. Its introduction fills a critical gap in the testing infrastructure for compiler pipelines that leverage MLIR, LLVM, and related frameworks (Ates et al., 5 Jan 2026).

1. Motivation and Context

The utility of random-program generation in compiler validation has been prominently demonstrated by tools such as Csmith, which discovered numerous bugs and missed optimizations in C compilers through the automatic synthesis of safe, terminating C programs. However, MLIR presents unique difficulties: dialects are user-defined and extensible, their operations are governed by heterogeneous semantic constraints, and no pre-existing Csmith-style tools are applicable. MLIR-Smith was developed to address these needs, providing a platform-independent generator capable of targeting arbitrary dialect sets and supporting deep configurability over module structure, control flow, and operation density (Ates et al., 5 Jan 2026).

2. Core Architecture and Components

MLIR-Smith comprises three principal components:

  • Configuration Core: Parses user-supplied configuration, initializes the MLIR module, and sets up the func @main region.
  • GeneratorOpBuilder: An extension of MLIR's native OpBuilder, this helper class manages the sampling and emission of operations under user-defined constraints on block length and nesting depth.
  • GeneratableOpInterface: A trait and interface that dialect authors attach to each operation. This enables MLIR-Smith to discover, categorize, and invoke per-operation generation routines at runtime via the MLIR dialect registry.

MLIR-Smith does not attempt to parse the full MLIR grammar, which is distributed between C++ and TableGen specifications. Instead, each GeneratableOpInterface instance exposes two functions: getGeneratableTypes(), returning result types valid in context, and generate(), responsible for producing the operation by synthesizing operands (recursively sampling operations or using pre-existing values) and invoking the builder. This design ensures MLIR-Smith remains fully dialect-agnostic; support for new dialects requires only trait attachment and concise C++ implementations describing operand selection and region semantics (Ates et al., 5 Jan 2026).

3. Random Program Generation Algorithm

MLIR-Smith constructs valid MLIR modules using a top-down "block-filling" strategy, analogous in philosophy to Csmith. The algorithm operates as follows:

  1. Block Termination Selection: The intended terminator type (e.g., return, yield, fall-through) is sampled for each block.
  2. Operation Sampling: Enabled generatable operations are assigned weights (wi)(w_i), forming a discrete distribution:

P(opi)=wi∑jwjP(\mathrm{op}_i) = \frac{w_i}{\sum_j w_j}

The chosen operation's generate() function is invoked.

  1. Operand and Type Constraints: If the operation can be instantiated given existing values or recursively sampled operands, it is appended; otherwise, the failed operation is removed from contention, and another is sampled.
  2. Termination Criteria: The process continues until reaching the maximum block length LL or until only terminators remain feasible.

Sampling is governed by user-defined or default distributions—uniform over enabled operations, and geometric over nesting levels:

P(ℓ)=(1−p)ℓ−1 pP(\ell) = (1-p)^{\ell-1} \, p

for the loop-nest level â„“\ell, where the expected depth E[â„“]=1pE[\ell] = \frac{1}{p} is typically kept small (default depth limit is 4) (Ates et al., 5 Jan 2026).

4. Soundness, Constraints, and Reproducibility

To ensure soundness of generated programs—excluding out-of-bounds accesses or missing terminators—MLIR-Smith:

  • Enforces static dimension bounds (up to 100,000) on memrefs.
  • Disallows strided affine maps.
  • Aborts any branch attempting to generate dynamic shapes or unsupported operations, retrying alternative branches.
  • Uses global parameters regionDepthLimit and blockLength to prevent unavoidable nontermination and deep recursion.
  • Discards any randomly generated module exceeding a fixed execution timeout.

All randomization draws are performed by a single, user-seedable std::mt19937_64 instance, ensuring reproducibility. Users can adjust per-operation weights in JSON or YAML configuration, for example raising the weight of scf.for to promote frequent loop generation (Ates et al., 5 Jan 2026).

5. Differential Testing Workflows

Upon generation of a random MLIR module, MLIR-Smith orchestrates differential testing across four major compilation pipelines:

Pipeline Name Stages Distinctiveness
MLIR pipeline mlir-opt passes → LLVM dialect → LLVM IR → clang -O0 Full MLIR opt passes + LLVM backend
LLVM pipeline MLIR-to-LLVM dialect → LLVM IR → opt -O3 → compile Skips MLIR opt passes; relies on LLVM optimization
DaCe pipeline MLIR → SDFG dialect (sdfg-opt) → SDFG IR → DaCe optimizer → compile Uses SDFG as an intermediate, DaCe's auto-optimizer
DCIR pipeline MLIR–opt passes → SDFG dialect → DaCe optimizer → compile Combines MLIR and DaCe pipelines

A shell harness (diff_test.sh) automates large-scale test campaigns, monitors compiler errors, mismatches, segmentation faults, timeouts, and missed optimizations (e.g., failure to elide external markers), and records program details (each test typically <200 KB) (Ates et al., 5 Jan 2026).

6. Empirical Results: Bug Discovery and Analysis

Empirical campaigns using several hundred generated programs led to identification and confirmation of significant defects in multiple pipelines, summarized as:

  • DaCe DCE bug: Live-analysis failed to remove unused memref.alloc in SDFG, inhibiting dead-code elimination.
  • DCIR translation bug: Incorrect lowering of arith.extsi on a boolean resulted in true\mathtt{true} mapping to 0 instead of −1-1 due to an unsigned move.
  • MLIR missed optimization: Store-load pairs on large statically allocated memrefs were not eliminated, provoking a segmentation fault, whereas all other pipelines successfully removed the redundant accesses.

By comparing program behaviors and generated code artifacts, MLIR-Smith demonstrated capability to detect both functional bugs and optimization coverage gaps across diverse compiler infrastructures, even in absence of a formal ground truth (Ates et al., 5 Jan 2026).

7. Extensibility and Future Directions

MLIR-Smith adopts a trait-based, plug-and-play model, allowing immediate support for new dialects by annotating operations with GeneratableOpInterface methods. Future anticipated enhancements include:

  • Expansion to additional dialects (affine, vector, GPU).
  • Support for composite types, array-of-struct types, and unbounded type families.
  • Integration of liveness-driven sampling (e.g., in the style of Barány 2017) and optimization markers to stress-test elimination capabilities.
  • Statistical analysis of corpus properties (fail-rate curves 1−(1−p)n1-(1-p)^n, confidence intervals for bug detection) as scale increases.

These extensions aim to further harden the MLIR ecosystem and inspire analogous approaches in other multi-level IR infrastructures (Ates et al., 5 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MLIR-Smith.