Papers
Topics
Authors
Recent
Search
2000 character limit reached

LLMCompiler: Compiler with Large Language Models

Updated 10 March 2026
  • LLMCompiler is a compiler architecture that integrates large language models with billions of parameters to optimize code transformation tasks.
  • It employs transformer-based models trained on compiler-specific corpora to perform IR-level optimization, code generation, and neural network inference translation.
  • LLMCompilers use iterative feedback and hybrid neural-symbolic workflows to enhance optimization metrics and minimize compilation errors.

A LLM Compiler (LLMCompiler) is a compiler architecture where a LLM—typically a transformer with billions of parameters—directly participates in, or fully orchestrates, code transformation tasks traditionally performed by handcrafted compiler components. LLMCompiler architectures have been proposed and evaluated for a spectrum of roles including IR-level optimization, code generation, neural network inference translation, tool invocation orchestration, and even end-to-end source-to-target compilation. Contemporary LLMCompilers operate over representations such as LLVM IR, assembly code, or computational graphs, leveraging model-driven reasoning to select transformations, optimize performance, or generate compilable outputs. This paradigm is instantiated in diverse operational settings, including foundation models for compiler optimization, SQL-based inference serving, tensor accelerator code mapping, and parallel tool execution within agent systems.

1. Model Architectures and Data Representations

LLMCompilers are typically transformer-based, autoregressive models, sized from 7B to 70B+ parameters and trained on heavily compiler-centric corpora. For instance, the LLMCompiler described in "LLMs for Compiler Optimization" employs a 7B-parameter encoder–decoder transformer (LLaMa 2-based) with 32 layers and rotary positional embeddings, ingesting LLVM-IR sequences normalized to remove extraneous semantic noise. The model processes up to 2,048 tokens per context and emits, in a structured format, a list of optimization passes, instruction counts, and optimized IR (Cummins et al., 2023). Meta’s LLM Compiler extends this foundation to 7B and 13B parameter scales with context windows up to 16,384 tokens, and is instruction fine-tuned across four stages: IR and assembly pretraining, compiler emulation, optimization flag tuning, and disassembly (Cummins et al., 2024).

Several systems integrate LLMs at key phases of compilation: as IR-level optimizers, neural translation planners, or context-aware transformation proposers. In LLM-aided tensor processing compilation, the LLM operates at the granularity of high-level tensor operators and ISA-level primitives, with code translation decomposed into subtasks and optimization prompts parameterized by architectural specifications (Hong et al., 2024). In parallel function orchestration, the LLM parses a user query and compiles it into a task dependency DAG, orchestrated by auxiliary units for execution (Kim et al., 2023). Additionally, LLMCompilers have been designed for portable inference, wherein neural operators from computational graphs (e.g., ONNX) are mapped directly to relational algebra and executed as SQL queries in relational databases (Sun et al., 5 Feb 2025).

2. Training Objectives and Optimization Tasks

LLMCompiler models are trained using token-level autoregressive objectives augmented by auxiliary regression (for scalar metrics, e.g., instruction counts) and code generation losses. For LLVM IR optimization, the total loss comprises cross-entropy over pass lists, mean-squared error on predicted counts, and cross-entropy over fully optimized IR (Cummins et al., 2023).

Meta’s LLM Compiler applies standard cross-entropy objectives but stages the data to progressively specialize in IR/assembly handling, emulation, flag tuning (for optimization pass prediction), and disassembly (assembly-to-IR code translation) (Cummins et al., 2024). Fine-tuning on flag-tuning and disassembly rounds augments the model’s ability to predict pass sequences and reconstruct IR from target binaries.

LLMCompilers oriented toward tensor accelerators adopt a two-phase approach: (1) functional translation to the accelerator’s DSL/ISA, and (2) subsequent performance optimization, guided by in-context prompt engineering and optional cycle-accurate cost models (Hong et al., 2024). Systems for tool orchestration or agent-style programming leverage task decomposition and iterative self-correction, relying on prompt engineering, chain-of-thought augmentation, or explicit feedback loops from external error or test signals (Kim et al., 2023, Kjellberg et al., 17 Jan 2026).

3. Inference Mechanisms and System Integration

At inference, LLMCompilers typically operate in one of two modes: inference-only pass-sequence suggestion (e.g., generating an optimal LLVM opt flag set), or emission of fully transformed code for downstream execution and validation. The LLMCompiler for LLVM IR uses greedy decoding for deterministic pass list generation, feeding model-suggested flags back to the LLVM opt tool and optionally leveraging a hybrid strategy of compiling with both -Oz and the model’s suggestions to select the superior result (Cummins et al., 2023).

In agent-based and feedback-driven settings, the LLM forms the core of an iterative Compile–Analyze–Revise loop. Code generated from an LLM prompt is compiled, errors parsed, and the error messages (structured by priority or type) are injected back into subsequent prompts, enabling self-repair, error mitigation, and increased compilation success rates (Kjellberg et al., 17 Jan 2026, Zhang et al., 6 Nov 2025). In function orchestration, an LLM-generated DAG of tool calls is executed in parallel by a task-fetching unit and executor, reducing latency and cost by avoiding the classical ReAct sequential model (Kim et al., 2023).

For neural network inference, LLMCompiler architectures translate entire computation graphs into SQL by mapping each primitive (e.g., MatMul, Softmax) to a relational operator pattern. Model parameters are chunked and loaded as tables, while operator fusion and stateful inference (e.g., key-value caching in transformer attention) are implemented as batched queries and updates, leveraging relational database capabilities for scalability in memory-constrained environments (Sun et al., 5 Feb 2025).

4. Evaluation Benchmarks and Quantitative Results

LLMCompilers are evaluated on standard compiler and code translation benchmarks, including AI-SOCO, ExeBench, POJ-104, CSmith, YARPGen, MiBench, AnsiBench, and bespoke datasets such as CompilerEval and CompilerEval-Hard. Key metrics include instruction-count reduction (relative to a baseline such as -Oz), code size shrinkage, improvement/regression counts per function, BLEU and exact-match rates for IR or assembly generation, and functional pass rates for translated or optimized kernels (Cummins et al., 2023, Cummins et al., 2024, Zhang et al., 26 May 2025, Zhang et al., 6 Nov 2025).

Selected results:

System / Model Main Task Key Results Reference
LLMCompiler (7B) LLVM pass selection 3.01 % instruction count reduction over -Oz, 90.3% compilable IR, 68.4% exact-match (Cummins et al., 2023)
Meta LLM Compiler FTD (13B) Code size opt / disasm 74 % of autotuner potential in flag tuning (4.88% vs 6.63% gain), disassembly exact-match 13.8% (Cummins et al., 2024)
LEGO-Compiler (Claude-3.7) C→asm translation 99.7% ExeBench pass@1, 97.9% AnsiBench, functions up to ~10,000 tokens scale (Zhang et al., 26 May 2025)
LLM SQL Compiler (Llama3-13B) NN inference serving 3× token throughput over CPU baseline for 13B model, up to 30× improvement in memory-constrained setups (Sun et al., 5 Feb 2025)
LLM+MCTS (GPT-4o mini) NN code tuning 7.08× speedup in 36 samples (Llama3-Attn); 3.3× faster than MetaSchedule at equal sample budget (Tang et al., 2 Jun 2025)
LLMCompiler (Qwen-3-4B + gcc) Agent compilation Compilation success up from 18.0% (baseline) to 97.4% (w/ feedback agent), syntax error rate –75% (Kjellberg et al., 17 Jan 2026)
LLM-Parallel Function Agent function orchestration Up to 3.7× speedup, 6.7× cost reduction, accuracy up ∼9% over ReAct (Kim et al., 2023)

Significance: LLMCompilers can rival or exceed the performance of conventional compiler autotuners in code size reduction (with minimal extra compilations at inference), achieve high behavioral and functional correctness when coupled with iterative self-correction, and enable deployment in hardware and software ecosystems previously inaccessible to standard compilers.

5. Analysis of Methods, Failure Modes, and Limitations

Strengths of the LLMCompiler paradigm include deep, token-level or instruction-level representation learning, capacity for zero-shot or context-sensitive optimization, support for hybrid symbolic–neural compilation workflows, and adaptability to novel languages, IRs, or architectures with minimal hand-engineering (Cummins et al., 2023, Zhang et al., 26 May 2025, Sun et al., 5 Feb 2025).

Reported limitations:

Failure analysis for direct LLM-to-assembly pipelines indicates that "success@1" remains modest (10–35%) for current general-purpose LLMs, increasing with model scale, targeted prompt engineering (+2–7 p.p.), and chain-of-thought reasoning (+5–30 p.p.) (Zhang et al., 6 Nov 2025).

6. Comparative Paradigms and Directions for Advancement

LLMCompiler research distinguishes two dominant paradigms:

  • Foundation Model Compilers: Models pre-trained and fine-tuned on massive code/IR/assembly corpora, supporting downstream fine-tuning, disassembly, and flag prediction. Meta LLM Compiler exemplifies this with an open commercial release, achieving ~75% of exhaustive autotuner capability (Cummins et al., 2024).
  • Closed-Loop Agent Compilers: Architectures in which the LLM is tightly integrated with error-oracles (e.g., gcc/clang), prompt refinement modules, and memory buffers, transforming single-shot code generators into iterative, tool-augmented agents (Kjellberg et al., 17 Jan 2026).
  • Hybrid Model-Search Compilers: Systems incorporating LLMs as proposal engines within search heuristics (e.g., MCTS), balancing learned transformation suggestion with structured exploration and empirical cost modeling (Tang et al., 2 Jun 2025).
  • Parallel Function Orchestration: LLMCompilers planning and executing tool-calls as parallel task DAGs, supporting up to 3.7× speedup and substantial cost reduction while maintaining or increasing functional accuracy (Kim et al., 2023).

Major research trajectories for LLMCompilers encompass:

  • Scalable model architectures with extended context windows
  • Domain- and IR-specialized pretraining, including RL from compiler feedback
  • Efficient knowledge compression for target ISAs, grammars, and calling conventions
  • Integration with formal verification and symbolic debuggers
  • Unified compilers capable of handling very large, system-scale codebases (>10K lines)
  • Deployment of lightweight, distilled LLMCompilers for energy-efficient developer tools
  • Distributed, hardware-agnostic inference and compilation (including in-database or edge deployments)

Advances in prompt engineering, context compression, and hybrid neural-symbolic workflows are critical for overcoming current performance and scaling barriers. Empirical evidence suggests that LLMCompilers, appropriately orchestrated and configured, can significantly narrow the gap with traditional heuristics- and search-based compilers, with an emerging potential to surpass them in maintainability, adaptability, and semantic reasoning (Cummins et al., 2023, Cummins et al., 2024, Zhang et al., 6 Nov 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Large Language Model Compiler (LLMCompiler).