MLIR-Smith: Random Program Generator

Updated 12 January 2026

MLIR-Smith is a grammar-guided random program generator for MLIR that creates valid, diverse modules to test MLIR-based compiler optimizations.
It features a modular architecture with a trait-based GeneratableOpInterface to easily support user-defined, extensible dialects.
MLIR-Smith supports differential testing across multiple compiler pipelines and has detected critical functional bugs and optimization gaps.

MLIR-Smith is a grammar-guided random program generator tailored for the Multi-Level Intermediate Representation (MLIR) ecosystem, designed to enable rigorous testing and evaluation of MLIR-based compiler optimizations. Unlike prior random program generation tools such as Csmith, MLIR-Smith explicitly addresses the challenges posed by MLIR's user-extensible dialects and the lack of a fixed grammar, providing a dialect-agnostic mechanism to generate valid, diverse MLIR modules. Its introduction fills a critical gap in the testing infrastructure for compiler pipelines that leverage MLIR, LLVM, and related frameworks (Ates et al., 5 Jan 2026).

1. Motivation and Context

The utility of random-program generation in compiler validation has been prominently demonstrated by tools such as Csmith, which discovered numerous bugs and missed optimizations in C compilers through the automatic synthesis of safe, terminating C programs. However, MLIR presents unique difficulties: dialects are user-defined and extensible, their operations are governed by heterogeneous semantic constraints, and no pre-existing Csmith-style tools are applicable. MLIR-Smith was developed to address these needs, providing a platform-independent generator capable of targeting arbitrary dialect sets and supporting deep configurability over module structure, control flow, and operation density (Ates et al., 5 Jan 2026).

2. Core Architecture and Components

MLIR-Smith comprises three principal components:

Configuration Core: Parses user-supplied configuration, initializes the MLIR module, and sets up the func @main region.
GeneratorOpBuilder: An extension of MLIR's native OpBuilder, this helper class manages the sampling and emission of operations under user-defined constraints on block length and nesting depth.
GeneratableOpInterface: A trait and interface that dialect authors attach to each operation. This enables MLIR-Smith to discover, categorize, and invoke per-operation generation routines at runtime via the MLIR dialect registry.

MLIR-Smith does not attempt to parse the full MLIR grammar, which is distributed between C++ and TableGen specifications. Instead, each GeneratableOpInterface instance exposes two functions: getGeneratableTypes(), returning result types valid in context, and generate(), responsible for producing the operation by synthesizing operands (recursively sampling operations or using pre-existing values) and invoking the builder. This design ensures MLIR-Smith remains fully dialect-agnostic; support for new dialects requires only trait attachment and concise C++ implementations describing operand selection and region semantics (Ates et al., 5 Jan 2026).

3. Random Program Generation Algorithm

MLIR-Smith constructs valid MLIR modules using a top-down "block-filling" strategy, analogous in philosophy to Csmith. The algorithm operates as follows:

Block Termination Selection: The intended terminator type (e.g., return, yield, fall-through) is sampled for each block.
Operation Sampling: Enabled generatable operations are assigned weights $(w_i)$ , forming a discrete distribution:

$P(\mathrm{op}_i) = \frac{w_i}{\sum_j w_j}$

The chosen operation's generate() function is invoked.

Operand and Type Constraints: If the operation can be instantiated given existing values or recursively sampled operands, it is appended; otherwise, the failed operation is removed from contention, and another is sampled.
Termination Criteria: The process continues until reaching the maximum block length $L$ or until only terminators remain feasible.

Sampling is governed by user-defined or default distributions—uniform over enabled operations, and geometric over nesting levels:

$P(\ell) = (1-p)^{\ell-1} \, p$

for the loop-nest level $\ell$ , where the expected depth $E[\ell] = \frac{1}{p}$ is typically kept small (default depth limit is 4) (Ates et al., 5 Jan 2026).

4. Soundness, Constraints, and Reproducibility

To ensure soundness of generated programs—excluding out-of-bounds accesses or missing terminators—MLIR-Smith:

Enforces static dimension bounds (up to 100,000) on memrefs.
Disallows strided affine maps.
Aborts any branch attempting to generate dynamic shapes or unsupported operations, retrying alternative branches.
Uses global parameters regionDepthLimit and blockLength to prevent unavoidable nontermination and deep recursion.
Discards any randomly generated module exceeding a fixed execution timeout.

All randomization draws are performed by a single, user-seedable std::mt19937_64 instance, ensuring reproducibility. Users can adjust per-operation weights in JSON or YAML configuration, for example raising the weight of scf.for to promote frequent loop generation (Ates et al., 5 Jan 2026).

5. Differential Testing Workflows

Upon generation of a random MLIR module, MLIR-Smith orchestrates differential testing across four major compilation pipelines:

Pipeline Name	Stages	Distinctiveness
MLIR pipeline	`mlir-opt` passes → LLVM dialect → LLVM IR → `clang -O0`	Full MLIR opt passes + LLVM backend
LLVM pipeline	MLIR-to-LLVM dialect → LLVM IR → `opt -O3` → compile	Skips MLIR opt passes; relies on LLVM optimization
DaCe pipeline	MLIR → SDFG dialect (`sdfg-opt`) → SDFG IR → DaCe optimizer → compile	Uses SDFG as an intermediate, DaCe's auto-optimizer
DCIR pipeline	MLIR–opt passes → SDFG dialect → DaCe optimizer → compile	Combines MLIR and DaCe pipelines

A shell harness (diff_test.sh) automates large-scale test campaigns, monitors compiler errors, mismatches, segmentation faults, timeouts, and missed optimizations (e.g., failure to elide external markers), and records program details (each test typically <200 KB) (Ates et al., 5 Jan 2026).

6. Empirical Results: Bug Discovery and Analysis

Empirical campaigns using several hundred generated programs led to identification and confirmation of significant defects in multiple pipelines, summarized as:

DaCe DCE bug: Live-analysis failed to remove unused memref.alloc in SDFG, inhibiting dead-code elimination.
DCIR translation bug: Incorrect lowering of arith.extsi on a boolean resulted in $\mathtt{true}$ mapping to 0 instead of $-1$ due to an unsigned move.
MLIR missed optimization: Store-load pairs on large statically allocated memrefs were not eliminated, provoking a segmentation fault, whereas all other pipelines successfully removed the redundant accesses.

By comparing program behaviors and generated code artifacts, MLIR-Smith demonstrated capability to detect both functional bugs and optimization coverage gaps across diverse compiler infrastructures, even in absence of a formal ground truth (Ates et al., 5 Jan 2026).

7. Extensibility and Future Directions

MLIR-Smith adopts a trait-based, plug-and-play model, allowing immediate support for new dialects by annotating operations with GeneratableOpInterface methods. Future anticipated enhancements include:

Expansion to additional dialects (affine, vector, GPU).
Support for composite types, array-of-struct types, and unbounded type families.
Integration of liveness-driven sampling (e.g., in the style of Barány 2017) and optimization markers to stress-test elimination capabilities.
Statistical analysis of corpus properties (fail-rate curves $1-(1-p)^n$ , confidence intervals for bug detection) as scale increases.

These extensions aim to further harden the MLIR ecosystem and inspire analogous approaches in other multi-level IR infrastructures (Ates et al., 5 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

MLIR-Smith: A Novel Random Program Generator for Evaluating Compiler Pipelines (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MLIR-Smith.