PolyglotPiranha: Scalable Code Transformation DSL

Updated 8 February 2026

PolyglotPiranha is a language-agnostic, declarative DSL designed for high-performance, human-readable code transformations across multiple programming languages.
It features a modular architecture with a parsing front end, AST matcher, graph-structured rule engine, and code rewriter to efficiently integrate into CI/CD pipelines.
LLM-powered agents like SPELL synthesize reusable migration scripts, achieving high validation rates and outperforming previous systems in automated API migrations.

PolyglotPiranha is a language-agnostic, declarative domain-specific language (DSL) and transformation engine developed to enable large-scale, automated source-to-source program modifications. Originally engineered at Uber, PolyglotPiranha’s primary goal is to facilitate high-performance, human-readable code transformations—particularly for software refactoring and API migration—across multiple programming languages. In the context of automated API migration, it is the transformation target for synthesized migration logic produced by agents such as SPELL, which leverage LLMs to extract behavioral equivalence and generalize code rewrites into reusable scripts. The engine is architected to operate efficiently in CI/CD pipelines, processing thousands of source files within seconds, and supports maintainability, extensibility, and verifiable transformation outcomes (Ramos et al., 1 Feb 2026).

1. Architecture and Design Principles

PolyglotPiranha’s architecture is modular, supporting composability and performance within heterogeneous codebases. Its primary pipeline components are:

Parsing Front End: Utilizes a language-agnostic, parser-combinator framework (built atop comby) to produce concrete-syntax trees (CSTs) or abstract-syntax trees (ASTs) with token and location annotations. Parsers for different languages (e.g., Python, Java) are interchangeable, granting the system cross-language (“polyglot”) adaptability.
AST Matcher / Pattern Engine: Transformation rules are expressed as concrete-syntax patterns containing named template variables (“holes”). The matcher scans CSTs for unifications with these patterns, enabling robust, syntax-aware identification of rewrite targets.
Rule Graph / Control-flow Engine: Rules are structured as nodes within a directed, labeled graph. Edges denote sequential and scoped application, such as instructing a rewrite only after an import substitution within a file, followed by targeted rewrites in function or class scopes.
Code Rewriter / Emitter: Matches trigger the replacement of code segments using “after” patterns, typically preserving original formatting and comments unless explicitly altered.
Runtime / Fixpoint Loop: Rule application follows a depth-first traversal over the rule graph, enqueuing follow-up rules via labeled edges and iterating until a steady state is reached (no further changes). This enables high-throughput, in-memory, multi-file rewriting well suited for CI/CD workflows.

Design goals emphasize declarative transformations, modularity via graph-composed rules, polyglot operation, rapid execution at enterprise scale, and scripts that are readable, testable, and maintainable.

2. Automated Migration Data Distillation Using LLMs

SPELL integrates LLMs in a structured workflow to collect migration data suitable for synthesis into PolyglotPiranha scripts. The workflow proceeds as follows:

Generation of Raw Migration Triples:

$\widetilde{\mathcal{M}} = \left\{ (\tilde s_i, \tilde t_i, \tilde m_i) \right\}_{i=1}^n$

where $\tilde s_i$ is an LLM-generated program using source library $\mathcal S$ , $\tilde m_i$ a migrated implementation for target library $\mathcal T$ , and $\tilde t_i$ a test suite expected to validate equivalence.

Validation and Filtering:

$\mathcal{M} = \left\{ (s_i, t_i, m_i) \right\}_{i=1}^m,\quad m\le n$

retaining only triples where both $s_i$ and $m_i$ compile, execute, and pass $t_i$ with ≥ 60% line coverage.

Prompts are staged to generate (1) abstract migration scenarios, (2) diverse implementations, (3) test suites, and (4) migrated versions, resulting in validated migration data representing behavioral equivalence between source and target APIs.

3. Synthesis of Transformation Scripts: Anti-Unification and Agentic Coordination

Validated code pairs are converted into PolyglotPiranha scripts in a two-phase process:

Atomic Rule Inference (Anti-Unification):

Diff hunks $\Delta(s_i, m_i)$ are computed, and each hunk is processed by an anti-unification algorithm (adapted from the MELT system) to derive candidate rewrite rules:

$r = \mathcal{A}(h^-, h^+)$

where $h^-$ and $h^+$ designate removed and added lines, respectively, yielding initial pairings of match and replace patterns with variable abstractions.

Agent-based Script Synthesis:

A small LLM (e.g., GPT-4.1), receiving the PolyglotPiranha DSL specification and debugging context, iteratively refines and composes rules into a rule graph: 1. Proposes a candidate graph-structured Piranha script. 2. Executes the script and receives diagnostic feedback. 3. Refines rules, adjusts scoping or ordering, and revalidates using the migration’s test suite. 4. Iterates up to ten times or until behavioral equivalence is confirmed.

This orchestrated process yields concise, scoped, and reusable PolyglotPiranha scripts capable of generalizing beyond individual migration examples.

4. Illustrative Example: API Migration Script

A concrete instance is provided for migrating from cryptography.fernet to pycryptodome:

Original Code (cryptography.fernet):

from cryptography.fernet import Fernet
def encrypt_document(document: str, key: bytes) -> bytes:
    cipher = Fernet(key)
    encrypted = cipher.encrypt(document.encode())
    return encrypted

Migrated Code (pycryptodome):

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad

def encrypt_document(document: str, key: bytes) -> bytes:
    cipher = AES.new(key, AES.MODE_CBC)
    padded_data = pad(document.encode(), AES.block_size)
    encrypted = iv + cipher.encrypt(padded_data)
    return encrypted

Synthesized PolyglotPiranha Script (excerpt):

- name: replace_import
  match: from cryptography.fernet import Fernet
  replace: |
    from Crypto.Cipher import AES
    from Crypto.Util.Padding import pad
  after:
    - scope: File
      rule: replace_decl

- name: replace_decl
  match: *:[var] = Fernet(key)
  replace: cipher = AES.new(key, AES.MODE_CBC)
  after:
    - scope: Function
      rule: replace_encrypt

- name: replace_encrypt
  match: *:[var] = *:[var2].encrypt(*:[data])
  replace: |
    padded_data = pad(*:[data], AES.block_size)
    *:[var] = iv + *:[var2].encrypt(padded_data)

Pattern variables and control-flow edges enable modular, context-sensitive transformation sequences, reflecting design principles outlined above.

5. Empirical Evaluation and Comparative Performance

SPELL’s approach, leveraging PolyglotPiranha scripts, was benchmarked across ten popular Python library migrations against MELT—a prior anti-unification-based system. For each use case, success metrics included the number of validated test-triples, first-try script synthesis rates, and “sibling success” (the proportion of alternative test cases solved by the same script).

Migration	Valid Triples	SPELL Success	MELT Success	Sibling %
argparse → click	215	44.2%	17.2%	53.5%
json → orjson	269	96.7%	57.6%	88.6%
logging → loguru	114	85.1%	72.8%	68.8%
cryptography → pycryptodome	79	48.1%	6.3%	6.5%
—	—	61.6% (avg.)	22.9% (avg.)	63.3% (avg.)

SPELL with PolyglotPiranha consistently outperformed MELT in one-shot synthesis and generalization to sibling use cases. In real-world applicability studies across 18 open-source repositories, scripts triggered from a handful to hundreds of rewrites per project, preserving ≥ 90% of existing test coverage in many instances (Ramos et al., 1 Feb 2026).

6. Limitations and Prospective Directions

Identified limitations of PolyglotPiranha and its automated synthesis workflow include:

Coverage and Bias: LLM-distilled migration examples privilege common idioms, potentially neglecting corner cases and atypical error handling paths.
DSL Expressivity: PolyglotPiranha cannot rewrite code embedded inside string literals, templates, or arbitrary embedded DSL fragments (e.g., Jinja), resulting in lower success on code using such constructs.
Validation Granularity: Reliance on test suite pass rates and ≥ 60% line coverage as proxies for behavioral equivalence can allow semantically insufficient or under-specified transformations, with more rigorous oracle mechanisms (e.g., mutation testing) suggested as avenues for future work.
Partial Migration: Some synthesized scripts address only frequently observed patterns and may not cover edge variants; integration of test-failure feedback or further LLM-based rule generalization is proposed (e.g., via Pycraft techniques).

A plausible implication is that extending PolyglotPiranha with multi-language embedded DSL parsing and integrating richer behavioral validation could further broaden its applicability in practical migration workflows.

7. Significance in Program Transformation Ecosystems

PolyglotPiranha exemplifies the shift toward modular, maintainable, and high-throughput transformation engines in software engineering. By serving as a target language for LLM-driven, agentic code migration frameworks like SPELL, it bridges the gap between data-driven synthesis and scalable, enterprise-grade deployment. Compared to systems that couple anti-unification directly to code rewriting (e.g., MELT), PolyglotPiranha’s graph-structured, declarative approach and runtime efficiency facilitate broader adoption in heterogeneous and rapidly evolving software ecosystems (Ramos et al., 1 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

SPELL: Synthesis of Programmatic Edits using LLMs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PolyglotPiranha.

PolyglotPiranha: Scalable Code Transformation DSL

1. Architecture and Design Principles

2. Automated Migration Data Distillation Using LLMs

3. Synthesis of Transformation Scripts: Anti-Unification and Agentic Coordination

4. Illustrative Example: API Migration Script

5. Empirical Evaluation and Comparative Performance

6. Limitations and Prospective Directions

7. Significance in Program Transformation Ecosystems

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PolyglotPiranha: Scalable Code Transformation DSL

1. Architecture and Design Principles

2. Automated Migration Data Distillation Using LLMs

3. Synthesis of Transformation Scripts: Anti-Unification and Agentic Coordination

4. Illustrative Example: API Migration Script

5. Empirical Evaluation and Comparative Performance

6. Limitations and Prospective Directions

7. Significance in Program Transformation Ecosystems

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research