Constrained Decoding Attack (CDA)

Updated 19 November 2025

Constrained Decoding Attack (CDA) is an approach that exploits output constraints to bypass safety measures in both language models and cryptographic systems.
In LLMs, CDA manipulates control-plane restrictions such as JSON schemas, enabling malicious outputs despite benign visible prompts.
In cryptography, CDA leverages restricted error alphabets to significantly reduce decoding complexity and undermine traditional security estimates.

Constrained Decoding Attack (CDA) is a broad term denoting adversarial exploitation of decoding constraints—grammar, schemas, or restricted alphabets—in generative models or cryptographic systems. In LLM safety, CDA refers to attacks that leverage structured output constraints to bypass safety mechanisms by manipulating the control plane; in code-based cryptography, it refers to algorithmic advances that exploit restricted alphabets to reduce decoding complexity. This entry encompasses both the jailbreak paradigm for LLMs (Zhang et al., 31 Mar 2025) and attacks on restricted syndrome decoding problems in cryptosystems (Baldi et al., 2023).

1. Conceptual Overview and Definitions

CDA exposes a critical vulnerability in any setting where generation is governed by output constraints rather than free-form sampling. In the LLM domain, modern APIs often use constrained decoding via developer-provided grammars or schemas to ensure syntactic compliance. The attacker manipulates these constraints (the control plane) instead of the visible prompt (data plane), embedding malicious content directly into the allowed output structure. Formally, a grammar $G = (N, \Sigma, P, S)$ —with nonterminals $N$ , terminal vocabulary $\Sigma$ , production rules $P$ , and start symbol $S$ —defines an output language $L(G)\subset\Sigma^*$ . During decoding at step $t$ , the model is forced to sample the next token $y_t$ only from the allowed set $C_G(y_{1:t-1}) = \{w \in \Sigma : y_{1:t-1}w \in \text{Pref}(L(G))\}$ . Constrained decoding thus enforces

$y_t \sim \text{Softmax}(z_t(y_{1:t-1}) + m_t)$

where $m_t[w] = 0$ if $w \in C_G(y_{1:t-1})$ , $-\infty$ otherwise. The attacker’s objective is to find an output $y_{1:T}$ such that $y_{1:T} \in L(G)$ and $\mathbf{1}[\mathrm{unsafe}(y_{1:T})]=1$ (Zhang et al., 31 Mar 2025).

In code-based cryptography, the terminology similarly refers to constraining the error alphabet $\mathcal{E} \subset \mathbb{F}_q$ . Here, the central challenge is the Restricted Syndrome Decoding Problem (RSDP): for a given parity-check matrix $H \in \mathbb{F}_q^{(n-k)\times n}$ , syndrome $s$ , restricted error set $\mathcal{E}$ , and Hamming weight $w$ , determine whether there exists $e\in(\mathcal{E}\cup\{0\})^n$ with $\mathrm{wt}(e) = w$ and $e H^\top = s$ (Baldi et al., 2023).

2. Methodologies and Attack Algorithms

2.1 LLM Control-Plane CDA

LLM-based CDA manipulates grammar/JSON schema, not the surface prompt. Attack implementations use "enum fields" in JSON schemas to force the model’s output to include a hidden question $Q$ or a "yes_prefix" (e.g. "Yes, here is how…") by structuring $G$ such that only maliciously crafted completions are possible. Prompt audits, which examine only the visible prompt $x$ , fail to detect hidden semantics encoded within $G$ (Zhang et al., 31 Mar 2025).

A canonical instantiation is the Chain Enum Attack:

Step 1: Query a weakly aligned LLM with a malicious enum schema to produce an unsafe prefix $P$ .
Step 2: Embed $P$ as the value in a new enum field within a second schema and query a more strongly aligned LLM. The output constraint enforces verbatim emission of $P$ , neutralizing safety alignment.

To accommodate computational or enumeration challenges, pruning heuristics (enumerating only the single required malicious element per enum) are used.

2.2 Cryptographic Constrained Decoding

In RSDP attacks, CDA denotes an algorithmic class that leverages restricted error alphabets to accelerate decoding. The general solver framework is Information-Set Decoding (ISD) with a Partial Gaussian Elimination (PGE) phase and advanced combinatorial list merging strategies (e.g., concatenation merge as in Stern/Dumer, or representation merge as in BJMM). For a given $(H, s, w)$ , one selects random information sets, applies PGE, and reduces to solving smaller instances which are enumerated by matching on partial syndromes (Baldi et al., 2023).

Key enumeration algorithms include:

Stern/Dumer Concatenation Merge: Splits problem, enumerates partial weight errors in two halves, sorts by partial syndrome, collides lists.
BJMM-style Representation Merge: Writes $e_2 = x_1 + x_2$ with overlaps, builds lists with controlled overlaps, and merges on partial syndrome matches, improving representation and merging efficiency.
Augmented Alphabet: For small $\mathcal{E}$ , allows $\mathcal{E}_+ = \mathcal{E} \cup \{\alpha+\beta\}$ in intermediate states, further multiplying representations.

Memory footprint is determined by the largest list size at merging layers. Time complexity per trial scales as $2^{F(R, W)n}$ where $F(R, W)$ incorporates list and merge exponents.

3. Quantitative Evaluation and Metrics

3.1 LLM Empirical Results

CDA attacks, including the EnumAttack and ChainEnumAttack, consistently achieve high attack success rates (ASR) and StrongREJECT scores (SR) across proprietary and open LLMs:

Model	ASR (%)	SR (%)
GPT-4o	99.2	88.9
GPT-4o-mini	98.1	85.0
Gemini-2.0-flash	90.1	82.8
Phi-3.5-MoE	97.4	73.7
Mistral Nemo	98.3	98.3
Qwen-2.5-32B	98.5	97.1
Llama-3.1-8B	97.6	95.1
Gemma-2-9B	94.8	94.8

EnumAttack single-shot ASR: 96.2% (5-benchmark average)
Avg. StrongREJECT: 82.6%
ChainEnumAttack on Phi-3.5-MoE (using GPT-4o prefix): ASR ≈99.6%, SR ≈83.0% (Zhang et al., 31 Mar 2025)

3.2 Cryptographic CDA Complexity

For RSDP, exploiting small $z = |\mathcal{E}| \ll q$ reduces enumeration complexity. Attack complexity is improved by a factor of $(z/q)^{\text{weight}}$ at each layer of list merging, and algebraic cancellations further accelerate enumeration. Empirical analysis demonstrates a reduction in claimed security (bits) for proposed cryptosystems by 40–60 bits. For example (Table 1, (Baldi et al., 2023)):

Reference	$z$	$q$	$n$	$R$	$W$	Claimed (bits)	This work (bits)
[oldrest2]	2	16381	400	0.75	0.16	128	69
[freudenberger]	4	157	312	0.50	0.34	144	85
[freudenberger]	6	157	312	0.20	0.60	101	55

A sample parameter set with $n=600$ , $k=300$ , $q=133$ , $z=4$ , $w=0.34n$ produces an attack cost of roughly $2^{222}$ , versus much higher values in the unrestricted setting (Baldi et al., 2023).

4. Security Implications

4.1 LLM Ecosystem

CDA shifts the attack surface from data-plane (input prompt) manipulation to control-plane (schema/grammar) exploitation. This renders standard prompt-based audits ineffective, since the visible prompt remains benign while the grammar enforces unsafe generation. Many safety alignments are shallow and target the initial free-generation tokens; by constraining these via schema (“Yes, here is…”), the attacker neutralizes alignment mechanisms. Output auditing, if performed at all, is not typically real-time, allowing CDA outputs to transit uninspected (Zhang et al., 31 Mar 2025).

4.2 Cryptographic Protocols

In code-based cryptosystems, use of restricted alphabets with small $z$ results in substantially reduced security margins under CDA algorithms. Unless $(n, k, w)$ are chosen with sufficient rigor and alphabet size increased, "provable" security estimates are markedly overoptimistic.

5. Defensive Strategies

5.1 LLM Defense Proposals

Safety-Preserving Constraints: Reserve a set of refusal tokens (e.g. “I’m sorry”) that cannot be masked out via user-provided grammar or schema, ensuring at least one safe exit path.
Token Provenance Tracking: Maintain metadata on which output tokens are user-prefilled versus model-generated, flagging outputs where forced malicious prefixes are present.
Integrated Safety Signaling: Mandate models to emit explicit safety flags (e.g. <political>, <violence>) when generating on sensitive topics, even under structure-constrained outputs, to enable downstream auditing (Zhang et al., 31 Mar 2025).

5.2 Cryptanalytic Recommendations

To sustain resistance against CDA in cryptosystems:

Avoid small alphabet sizes ( $z=2,4,6$ ). Increasing $z$ reinstates hardness, with $z\approx2^{10}$ approaching the unrestricted regime.
Rebalance parameters $(n, k, w)$ so $w$ is not disproportionately large relative to $n$ .
Recalculate security using the updated complexity $2^{F(R,W)n}$ , replacing obsolete security claims (Baldi et al., 2023).

CDA reveals fundamental limitations of prevailing security practices that narrowly target data-plane threats or treat output constraints as mere functionality enforcement mechanisms. In both machine learning and cryptography, CDA demonstrates that expressive or insufficiently regulated output constraints are an independent, cross-domain attack surface with demonstrated practical impact. This suggests that robust system design must integrate safety, provenance, and output constraint management holistically rather than treating grammar or syntax specification as a purely functional layer (Zhang et al., 31 Mar 2025, Baldi et al., 2023).

PDF Markdown Chat (Pro)

References (2)

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms (2025)

Generic Decoding of Restricted Errors (2023)

Follow Topic

Get notified by email when new papers are published related to Constrained Decoding Attack (CDA).