Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Edit Flows: CTMC Sequence Editing Framework

Updated 30 June 2025
  • Edit Flows is a generation framework that models variable-length sequence editing as a continuous-time Markov chain using token-level insertions, deletions, and substitutions.
  • It enables position-relative transitions to capture natural sequence variability, overcoming rigid fixed-length and autoregressive constraints.
  • Empirical evaluations show that Edit Flows improve performance in tasks like image captioning, open-ended text, and code generation while offering faster training and reduced overfitting.

Edit Flows is a non-autoregressive sequence generation framework that models discrete sequence synthesis as a continuous-time Markov chain (CTMC) over sequences of variable length, using token-level edit operations: insertions, deletions, and substitutions. Unlike previous models that operate under rigid, token-wise, or fixed-length constraints, Edit Flows support position-relative, flexible generation mimicking the natural process of sequence editing. The framework was introduced to address the limitations of both autoregressive and discrete masking models in capturing the structure and variability inherent to sequence data, such as language, code, and captions.

1. Model Architecture

Edit Flows parameterizes the generative process as a CTMC over the space of all possible token sequences up to length NN. At each time tt, the state is a sequence xtx_t; outgoing transitions correspond to all possible single-token edits. For every such edit, the model predicts a non-negative rate utθ(xxt)u_t^{\theta}(x \mid x_t). The network outputs rates for all possible insertions, deletions, and substitutions at all positions for the current sequence.

For a sequence xx of length n(x)n(x):

$\begin{align*} u^{\theta}_t((x, i, a) | x) &= \lambda^{(x)}_{t,i} Q^{(a|x)}_{t,i} & \text{(Insertion at %%%%6%%%%, token %%%%7%%%%)} \ u^{\theta}_t((x, i) | x) &= \lambda^{(x)}_{t,i} & \text{(Deletion at %%%%8%%%%)} \ u^{\theta}_t((x, i, a) | x) &= \lambda^{(x)}_{t,i} Q^{(a|x)}_{t,i} & \text{(Substitution at %%%%9%%%%, token %%%%10%%%%)} \end{align*}$

where λt,i(x)0\lambda_{t,i}^{(x)} \geq 0 are the rates and Qt,i(ax)Q_{t,i}^{(a|x)} are position-conditioned distributions over the vocabulary.

Sequence generation thus proceeds from an initial sequence (possibly empty or random), applying position-relative edit operations over continuous time.

2. Edit Operations: Insertions, Deletions, and Substitutions

Edit Flows enables token-level modification via three operations:

  • Insertion: Addition of a token aa at position ii in xx, producing (x,i,a)(x, i, a).
  • Deletion: Removal of the token at position ii in xx, yielding (x,i)(x, i).
  • Substitution: Replacement of the token at ii with aa, denoted (x,i,a)(x, i, a).

Transitions are defined only between sequences differing by a single edit operation, ensuring paths correspond to edit-based evolution rather than mask-only or direct replacement. Insertions and substitutions share a joint rate-parameterization: a rate for each position and a categorical distribution over possible tokens at that position, facilitating efficient sampling even for large vocabularies. Outflow constraints in the CTMC ensure that the total rate out of a state is balanced.

At simulation time, the current sequence is advanced by randomly sampling possible edit operations according to their rates, and applying the sampled edits—parallelizing over the sequence instead of requiring left-to-right or masking constraints.

3. Continuous-time Markov Chain (CTMC) Formulation

At the core of Edit Flows is a CTMC formulation where each state is a complete token sequence, and allowed transitions are those sequences differing by one edit operation. For any infinitesimal time interval hh:

P(Xt+h=xXt=xt)=δxt(x)+hut(xxt)+o(h)P(X_{t+h} = x | X_t = x_t) = \delta_{x_t}(x) + h u_t(x | x_t) + o(h)

where ut(xxt)u_t(x | x_t) is the modeled edit rate, and δxt\delta_{x_t} is the Dirac delta on xtx_t. Sequences are generated by traversing this (very large) edit graph over time, composing insertions, deletions, and substitutions.

This design allows:

  • Variable-length sequence generation, directly supporting growth and shrinkage during the generative process.
  • Position-relative transitions, so edits are conditioned on the position and current state, not rigid token ordering.
  • Padding-free modeling: No need for explicit padding or mask tokens, and no hard constraints on start/end position.
  • Non-factorized parallel evolution: All positions and operations are conditioned on the full current sequence.

4. Training Methodology

Training the CTMC over sequences is nontrivial due to the combinatorially large number of possible edit paths connecting source and target sequences. Edit Flows address this with an alignment-based expansion:

  • Auxiliary variables (alignments) are introduced to define a specific edit path between source x0x_0 and target x1x_1, encoded as aligned sequences z0z_0, z1z_1 possibly with padding or special tokens.
  • For each minibatch, a random alignment (edit sequence) between input-output pairs is sampled, used for loss computation but never exposed directly to the model at inference.
  • The training loss is a Bregman divergence family objective (a generalization of flow matching losses):

L(θ)=Et,π(z0,z1),pt(xt,ztz0,z1)[xxtutθ(xxt)i=1N1[z1izti]κ˙t1κtlogutθ(x(zt,i,z1i)xt)]\mathcal{L}(\theta) = \mathbb{E}_{ t,\, \pi(z_0, z_1),\, p_t(x_t, z_t | z_0, z_1) } \left[ \sum_{x \neq x_t} u_t^\theta(x | x_t) - \sum_{i=1}^N 1_{[z_1^i \neq z_t^i]} \frac{\dot{\kappa}_t}{1 - \kappa_t} \log u_t^\theta(x(z_t, i, z_1^i) | x_t) \right]

The first term penalizes excessive "edit rates" (encouraging minimal edits), the second promotes correct operation-specific transitions according to the sampled alignment.

This methodology generalizes the standard flow matching loss to the discrete, edit-based, variable-length sequence setting. The loss remains tractable due to the alignment sampling and state-space expansion.

5. Experimental Evaluation

Edit Flows are evaluated on benchmark tasks spanning image captioning, open-ended language understanding, and code generation at multiple model scales. Results demonstrate robust improvements:

  • Image Captioning (MS-COCO, 280M parameters):
Model METEOR CIDEr SPICE
Mask DFM 25.3 95.6 19.2
Autoregressive 25.7 95.5 19.6
Edit Flow 27.4 108.1 21.1
Localized Edit Flow 27.4 105.1 22.1

Edit Flows surpass both autoregressive and Mask DFM.

  • Open-ended Text (1.3B):
Model HellaSwag ARC-E ARC-C PIQA OBQA WinoGrande
Autoregressive 49.5 71.0 36.3 76.0 30.4 62.1
Mask DFM 38.3 55.4 27.8 65.3 22.6 52.3
Edit Flow 49.0 63.1 33.0 68.8 28.6 53.6
Edit Flow (L\mathcal{L}) 54.5 61.0 34.0 65.0 37.2 54.3

Edit Flows compete closely with autoregressive models and substantially outperform mask-based models.

  • Code Generation (1.3B):
Model HumanEval Pass@1 MBPP Pass@1
Autoregressive 17.0 25.6
Mask DFM 9.1 6.2
Localized Edit Flow 14.0 14.8
Edit Flow 12.8 10.0

Localized Edit Flow closes much of the performance gap for code with longer targets, demonstrating the model's applicability beyond mere short-form generation.

Other findings include efficient training (up to 3x faster per iteration than Mask DFM) and reduced overfitting to length heuristics due to position-relative generation.

6. Applications and Broader Implications

Edit Flows are broadly applicable to:

  • Text and code generation: Especially where length flexibility and position-specific insertions/deletions are valuable.
  • Image captioning: Achieve state-of-the-art results, illustrating utility for variable-length, semantically constrained tasks.
  • General sequence-to-sequence applications: Translation, summarization, dialog, and scenarios where edit-based alignment reflects natural task structure.

By modeling sequence evolution as a flexible edit path in CTMC space, Edit Flows provide a unified framework that subsumes prior models and affords new theoretical and empirical advantages. This approach lays a foundation for developing further discrete flows and non-factorized, edit-driven generative processes. The mathematical perspective also clarifies connections among autoregressive, substitution-only, and mask/diffusion models, highlighting the benefits of position-relative, operation-based modeling for natural language and code.