ChopChop: a Programmable Framework for Semantically Constraining the Output of Language Models

Published 30 Aug 2025 in cs.PL | (2509.00360v1)

Abstract: LLMs (LMs) can generate code, but cannot guarantee its correctness--producing outputs that often violate type safety, program invariants, or semantic equivalence. Constrained decoding offers a solution by restricting generation to programs that satisfy desired properties. Yet, existing methods are limited to shallow syntactic constraints or rely on brittle, ad hoc encodings of semantics over token sequences. We present ChopChop, the first programmable framework for semantic constrained decoding, enabling LMs to generate code that provably satisfies rich semantic properties. ChopChop connects token-level generation with reasoning over abstract program structures using a coinduction-based formalism and reduces constraint enforcement to a realizability problem over regular codata. We demonstrate ChopChop's generality through generation constrained by type safety and program equivalence, showing how formal methods can be seamlessly integrated into LM-driven code generation. ChopChop transforms semantic constrained decoding from a niche technique into a systematic, principled extension of LMs--improving success rates across models and tasks while maintaining practical decoding latency.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a programmable framework that enforces semantic constraints, ensuring LLM-generated code adheres to type safety and semantic equivalence.
It leverages regular coinduction and corecursive semantic pruners to incrementally filter abstract syntax trees during token-level generation.
Empirical results show improved semantic validity and compilability over unconstrained methods with only modest computational overhead.

ChopChop: A Programmable Framework for Semantically Constraining the Output of LLMs

Motivation and Problem Statement

LLMs have demonstrated strong capabilities in code generation, but their outputs often violate critical semantic properties such as type safety, program invariants, or semantic equivalence. Existing constrained decoding methods are limited to enforcing shallow syntactic constraints, typically via context-free grammars (CFGs), or rely on ad hoc manipulations of token sequences that do not scale to deep semantic properties. ChopChop addresses this gap by introducing a programmable framework for semantic constrained decoding, enabling LLMs to generate code that provably satisfies rich semantic properties by reasoning over abstract program structures.

Framework Overview

ChopChop formalizes semantic constrained decoding as a realizability problem over regular codata. At each decoding step, ChopChop constructs a symbolic representation of all possible Abstract Syntax Trees (ASTs) consistent with the current token prefix. User-defined semantic pruners—corecursive functions operating over codata—prune this space to eliminate ASTs violating the constraint. Decoding proceeds only if the pruned space is non-empty, ensuring that every accepted token keeps the generation process on track toward satisfying the semantic constraints.

The framework is instantiated by providing:

A parser definition mapping token sequences to ASTs.
A set of semantic pruners, each representing a constraint over sets of ASTs.

This approach bridges the syntax-semantics gap by connecting token-level generation with reasoning over abstract program structures, and it handles partial programs by operating incrementally as each token is produced.

Coinductive Representation and Corecursive Pruning

ChopChop leverages regular coinduction to finitely represent infinite program spaces as cyclic term graphs. Program spaces are defined as regular coterms, allowing efficient symbolic manipulation and fixpoint computations. The framework supports derivative-based parsing, inspired by Brzozowski derivatives, to incrementally advance parser states as tokens are generated.

Semantic pruners are implemented as corecursive functions over program spaces. For example, a pruner enforcing that all literals are odd numbers would recursively filter program spaces to retain only those matching the constraint. Nonemptiness of the pruned space is decided via a fixpoint computation, ensuring that only realizable prefixes are extended during decoding.

Applications

ChopChop demonstrates generality through two case studies:

1. Program Equivalence-Guided Decoding

ChopChop constrains LLMs to generate programs equivalent to a reference program, modulo term rewriting. Equivalence classes are efficiently represented using e-graphs, and rewrite rules are applied to generate the space of equivalent programs. The semantic pruner intersects the evolving program space with the e-graph automaton, ensuring only equivalent programs are generated.

Figure 1: Equivalence, CodeLlama-7b, all temperatures.

2. Type-Safe Decoding for TypeScript

ChopChop enforces type safety by lifting type checking to operate over sets of abstract partial terms. The semantic pruner implements a bidirectional type system, conservatively discarding ill-typed programs while preserving all well-typed ones. The pruner operates over types of bounded size to ensure regularity of the coterms.

Figure 2: TypeScript, DeepSeek-Coder-6.7b, all temperatures.

Evaluation

Effectiveness

ChopChop consistently improves the success rate of generating programs that satisfy semantic constraints compared to unconstrained and grammar-constrained decoding. In equivalence-guided decoding, unconstrained models often fail to produce any semantically valid outputs, while ChopChop achieves high success rates across models and temperatures. For type-safe decoding, ChopChop enables models to generate compilable TypeScript code even when unconstrained decoding fails, particularly for smaller models.

Quantitative results show that semantic constrained decoding delivers strong improvements in both domains, with best results per configuration consistently achieved by ChopChop. Notably, grammar-constrained decoding alone can perform worse than unconstrained generation due to limitations in the supported grammar fragment.

Overhead

The computational overhead of semantic constrained decoding is modest, with per-token latency ranging from tens to a few hundred milliseconds. The majority of tokens are accepted on the first try, indicating that semantic constraints do not excessively restrict the model's output space. The overhead is justified by the assurance of semantic correctness.

Implementation Considerations

ChopChop is implemented as an embedded DSL in Python, supporting symbolic manipulation of coinductive structures. The backend draws on techniques from CoCaml for regular coinduction and employs decorators for corecursive and fixpoint computations. Domain-specific simplifications and parser compaction optimizations are integrated to improve efficiency.

Trade-offs in implementation include the expressivity of semantic pruners versus computational tractability. Overapproximate pruners are used when exact realizability is undecidable, as in type-safe decoding for TypeScript. The framework is modular, allowing users to define custom pruners and parsers for new domains.

Theoretical Implications

ChopChop establishes a formal connection between semantic constrained decoding and realizability in program synthesis. Soundness and completeness of the decoding algorithm depend on the properties of the realizability checker (over-approximate, under-approximate, consistent) and the exploration strategy (e.g., fair enumeration, greedy decoding). The framework generalizes constrained decoding beyond syntactic constraints, enabling principled enforcement of deep semantic properties.

Future Directions

Potential extensions include:

Enhancing the corecursive solver with more permissive fixpoint computations to broaden expressivity.
Integrating fast syntactic filtering tools to reduce overhead.
Exploring new backends for reasoning about codata, such as term rewriting solvers or constrained Horn clauses.
Developing combinators for expressing complex semantic pruners and automated verification of pruner properties.

Conclusion

ChopChop provides a systematic, programmable framework for semantic constrained decoding, enabling LLMs to generate code that provably satisfies rich semantic properties. By operating at the level of abstract syntax and leveraging regular coinduction, ChopChop transforms semantic constrained decoding from a niche technique into a principled extension of LLM generation. The framework demonstrates strong empirical performance and opens new avenues for integrating formal methods with neural code generation.

Markdown Report Issue