Papers
Topics
Authors
Recent
Search
2000 character limit reached

Canonical Intermediate Representation (CIR)

Updated 9 February 2026
  • Canonical Intermediate Representation (CIR) is a structured semantic layer that standardizes the conversion of natural language descriptions into rigorously defined, solver-ready optimization models.
  • It enforces strict grammatical, syntactic, and ordering rules to eliminate ambiguity in problem formulations, ensuring a unique, deterministic mapping to mathematical representations.
  • Advances in CIR integrate multi-agent pipelines and paradigm-specific templates for handling complex models like MILP, enhancing mapping accuracy and operational robustness.

The Canonical Intermediate Representation (CIR) is a structured semantic layer that bridges high-level natural language descriptions of optimization problems and their formal mathematical instantiations, most notably in the automated formulation of linear, integer, and mixed-integer programs. CIR acts as a domain-specific language or schema, guaranteeing a unique, unambiguous textual or symbolic representation of problems by enforcing strict grammatical, syntactic, and ordering constraints. Recent research has also extended CIR to encompass a knowledge-driven, multi-agent pipeline capable of handling complex operational semantics across multiple modeling paradigms, thereby decoupling problem intent from its specific solver-friendly realization (Jang, 2022, Lyu et al., 2 Feb 2026).

1. Formal Definition and Canonicality Guarantees

CIR was initially introduced as a minimal domain-specific language for linear programs, defined strictly as a sequence of declarations—one objective, followed by a (possibly empty) list of constraint declarations. The CIR grammar and ordering rules enforce a deterministic mapping from a problem’s semantic structure to a single canonical string, eliminating ambiguity due to commutativity, reordering, or variable renaming. The grammar is:

  • Top-level:

CIR ::= ObjectiveDecl ConstraintDecl*

  • Objective Declaration:

ObjectiveDecl ::= "maximize" LinearExpr | "minimize" LinearExpr

  • Constraint Declaration:

ConstraintDecl ::= LinearExpr CompOp Constant, with CompOp ∈ {“≤”, “≥”} and Constant ∈ ℤ

  • Linear Expression Decomposition:

LinearExpr ::= Term ('+' Term)*; Term ::= Coeff Var; Coeff ∈ ℤ, Var ∈ {x, y, z, w, x₁, …}

  • Ordering Rules:
  1. Objective precedes constraints
  2. Constraints follow a fixed archetype ordering (lower-bound, upper-bound, xy-type, xby-type, sum-type, arbitrary linear, ratio-type)
  3. Terms within each constraint are lexicographically ordered by variable
  4. Within identically-shaped constraints, “≤” precedes “≥”
  5. Variable names are drawn from a fixed vocabulary

This canonicalization ensures that every abstract LP, modulo algebraic equivalence, maps to exactly one CIR representation (Jang, 2022).

2. CIR Representation in Mathematical and LaTeX Notation

While CIR is a textual format for NLP-to-mathematical program pipelines, its semantics are often displayed with conventional mathematical notation:

  • Variables: xRnx \in \mathbb{R}^n as “x₁, x₂, …, xₙ” or “x, y, z, w”
  • Objective:

CIR: maximize 3 x + 4 y LaTeX: maxx  cx\max_{x} \; c^\top x

  • Constraints:

CIR: 1 x + 2 y ≤ 50 LaTeX: AxbA x \leq b, where ARm×n,bRmA \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m

Coefficients of 1 are omitted (“1 x” → “x”), but others are not elided. This translation guarantees lossless mapping from CIR to matrix-based solver inputs such as AMPL or JSON (Jang, 2022).

3. Rule-to-Constraint Schema and Semantic Layering

Advances in CIR have generalized its scope from just linear programs to a flexible structured schema suitable for logic-heavy, mixed-integer, and quadratic programs. In this schema, a problem description dd is transformed into:

  • A set of operational rules R(d)={r1,...,rK}\mathcal{R}(d) = \{r_1,...,r_K\} and entities/parameters E(d)\mathcal{E}(d)
  • A multiset of instantiated templates C(d)=(E(d),A(d))C(d) = (\mathcal{E}(d), \mathcal{A}(d)), where each a=(A,k,p)a_\ell = (A_\ell, k_\ell, p_\ell) represents:
    • Core intent (e.g., NonOverlap, Precedence, Capacity)
    • Source rule from the input
    • Modeling paradigm (time-indexed, continuous-time, event-based, arc-flow, etc.)

The CIR-to-model mapping takes the union across all paradigm-specific constraint templates:

M(d)=T(C(d))==1LCA,pM(d) = T(C(d)) = \bigcup_{\ell=1}^{L} \mathcal{C}_{A_\ell, p_\ell}

where CA,p\mathcal{C}_{A, p} is the full set of mathematical constraints for intent AA under paradigm pp. This explicit intermediate abstraction ensures that all feasible solutions to M(d)M(d) satisfy the operational rules R(d)\mathcal{R}(d) by construction (Lyu et al., 2 Feb 2026).

4. Constraint Archetypes and Modeling Paradigms

CIR organizes operational rules into a compact set of archetypes, each with reference templates in multiple modeling paradigms. Key archetypes include:

Archetype Example CIR Expression LaTeX/Mathematical Formulation
Assignment ixij=1\sum_i x_{ij} = 1 jJ:iIxij=1\forall j\in J: \sum_{i\in I} x_{ij}=1
Precedence SjCiS_j \geq C_i SjSi+piS_j \geq S_i + p_i
Capacity jrj,kzj,tRk\sum_j r_{j,k} z_{j,t} \leq R_k t,k:jJrj,kzj,tRk\forall t,k:\sum_{j\in J} r_{j,k} z_{j,t} \leq R_k
Non-overlap Disjunction in start/completion or time-index variables Various forms, e.g., SjSj+pjM(1yjj)S_{j'} \geq S_j + p_j - M(1 - y_{jj'})
Time Windows ESjSjLSjES_j \leq S_j \leq LS_j ESjSjLSjES_j \leq S_j \leq LS_j
Integer-Multiple x=qu,uZ+x = q u, u \in \mathbb{Z}_+ x=qux = q u with constraints
Logical/Indicator 0xMy0 \leq x \leq M y x=0 if y=0x=0 \text{ if } y=0

Each archetype is instantiated for several paradigms: time-indexed MILP, continuous-time MILP, event-based, and arc-flow models. The CIR library maps each abstract intent to paradigm-specific templates:

{pCA,p}p{TI,CT,EB,AF}\{ p \mapsto \mathcal{C}_{A,p} \} \quad \forall p \in \{\mathrm{TI}, \mathrm{CT}, \mathrm{EB}, \mathrm{AF}\}

(Lyu et al., 2 Feb 2026).

5. CIR-Driven NLP Pipelines: Tagging, Embedding, and Multi-Agent Architectures

The text-to-CIR mapping employs a two-stage (and, in recent work, multi-agent) approach:

BART with Entity Tags (Jang, 2022):

  • Word-piece tokens and entity tags (identifying coefficients, variables, comparators, etc.) are embedded:

el=Ewltok+Elpos+λEtltage_l = E^{tok}_{w_l} + E^{pos}_l + \lambda E^{tag}_{t_l}

with tag embeddings weighted by a scaling factor λ\lambda.

  • The BiBART encoder processes sum embeddings to produce CIR outputs, which remain canonical due to grammar constraints.

Multi-Agent Rule-to-Constraint (R2C) Pipeline (Lyu et al., 2 Feb 2026):

  1. Extractor tags and extracts rules and entities from natural language.
  2. Mapper retrieves CIR intent templates per rule and binds parameters.
  3. Formalizer composes the full model, emitting both mathematical and solver-executable formulations.
  4. Checker verifies structural and semantic soundness end-to-end.

Retrieval of CIR templates is based on domain tags and semantic similarity, supported by in-memory and FAISS vector indices.

6. Illustrative Examples

Linear Programming Canonicalization (Jang, 2022):

  • NL statement: "A factory makes A ($3), B ($4); 1/2 machine-hours; ≤50 machine-hours; decide how many to produce; nonnegativity."
  • CIR: maximize 3 x + 4 y x ≥ 0 y ≥ 0 1 x + 2 y ≤ 50

Rich Rule Extraction and Paradigm Selection (Lyu et al., 2 Feb 2026):

  • NL statement: "Each job must finish before any downstream job begins, and at most one job can use Machine 1 at a time. Minimize makespan."
  • Extractor maps precedence and non-overlap to intents.
  • Mapper chooses continuous-time paradigm, instantiates correct templates.
  • Formalizer emits constraints in both math and Gurobi-Python code.

7. Empirical Performance and Practical Impact

Empirical results on recognized benchmarks emphasize CIR’s role as an indispensable intermediate layer:

System CIR Layer Key Accuracy (%) Benchmark
BART-large, λ=5 Yes 88.46 (declaration accuracy) LPWP validation (Jang, 2022)
R2C (7B LLM) Yes 47.2 (Accuracy Rate, AR) ORCOpt-Bench (Lyu et al., 2 Feb 2026)
GPT-5 No 39.8 (AR) ORCOpt-Bench (Lyu et al., 2 Feb 2026)
R2C ablation (no CIR) No 31.6 (AR) ORCOpt-Bench (Lyu et al., 2 Feb 2026)
R2C+reflection Yes 54.0 (AR) ORCOpt-Bench (Lyu et al., 2 Feb 2026)

Key findings include:

  • Tag- and entity-aware embeddings, especially with high λ, notably increase mapping accuracy in classic LP settings (Jang, 2022).
  • Introducing CIR in multi-agent frameworks yields a substantial accuracy gain over both proprietary and open baselines.
  • CIR enables training-agnostic, retrieval-based formulations that remain competitive with tuned LLMs on industry-scale tasks.
  • The reflection enhancement in R2C further boosts robustness in complex, compositional problem settings (Lyu et al., 2 Feb 2026).

In summary, CIR establishes an extensible, verifiable, and semantically sound layer between natural-language problem specification and executable optimization code. By encoding intents as archetypes with paradigm-specific instantiations, CIR enables both symbolic and data-driven models to robustly and transparently translate diverse operational rules into solver-ready programs (Jang, 2022, Lyu et al., 2 Feb 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Canonical Intermediate Representation (CIR).