Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

131 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Edit Flows: CTMC Sequence Editing Framework

Updated 30 June 2025

Edit Flows is a generation framework that models variable-length sequence editing as a continuous-time Markov chain using token-level insertions, deletions, and substitutions.
It enables position-relative transitions to capture natural sequence variability, overcoming rigid fixed-length and autoregressive constraints.
Empirical evaluations show that Edit Flows improve performance in tasks like image captioning, open-ended text, and code generation while offering faster training and reduced overfitting.

Edit Flows is a non-autoregressive sequence generation framework that models discrete sequence synthesis as a continuous-time Markov chain (CTMC) over sequences of variable length, using token-level edit operations: insertions, deletions, and substitutions. Unlike previous models that operate under rigid, token-wise, or fixed-length constraints, Edit Flows support position-relative, flexible generation mimicking the natural process of sequence editing. The framework was introduced to address the limitations of both autoregressive and discrete masking models in capturing the structure and variability inherent to sequence data, such as language, code, and captions.

1. Model Architecture

Edit Flows parameterizes the generative process as a CTMC over the space of all possible token sequences up to length $N$ . At each time $t$ , the state is a sequence $x_t$ ; outgoing transitions correspond to all possible single-token edits. For every such edit, the model predicts a non-negative rate $u_t^{\theta}(x \mid x_t)$ . The network outputs rates for all possible insertions, deletions, and substitutions at all positions for the current sequence.

For a sequence $x$ of length $n(x)$ :

$\begin{align*} u^{\theta}_t((x, i, a) | x) &= \lambda^{(x)}_{t,i} Q^{(a|x)}_{t,i} & \text{(Insertion at %%%%6%%%%, token %%%%7%%%%)} \ u^{\theta}_t((x, i) | x) &= \lambda^{(x)}_{t,i} & \text{(Deletion at %%%%8%%%%)} \ u^{\theta}_t((x, i, a) | x) &= \lambda^{(x)}_{t,i} Q^{(a|x)}_{t,i} & \text{(Substitution at %%%%9%%%%, token %%%%10%%%%)} \end{align*}$

where $\lambda_{t,i}^{(x)} \geq 0$ are the rates and $Q_{t,i}^{(a|x)}$ are position-conditioned distributions over the vocabulary.

Sequence generation thus proceeds from an initial sequence (possibly empty or random), applying position-relative edit operations over continuous time.

2. Edit Operations: Insertions, Deletions, and Substitutions

Edit Flows enables token-level modification via three operations:

Insertion: Addition of a token $a$ at position $i$ in $x$ , producing $(x, i, a)$ .
Deletion: Removal of the token at position $i$ in $x$ , yielding $(x, i)$ .
Substitution: Replacement of the token at $i$ with $a$ , denoted $(x, i, a)$ .

Transitions are defined only between sequences differing by a single edit operation, ensuring paths correspond to edit-based evolution rather than mask-only or direct replacement. Insertions and substitutions share a joint rate-parameterization: a rate for each position and a categorical distribution over possible tokens at that position, facilitating efficient sampling even for large vocabularies. Outflow constraints in the CTMC ensure that the total rate out of a state is balanced.

At simulation time, the current sequence is advanced by randomly sampling possible edit operations according to their rates, and applying the sampled edits—parallelizing over the sequence instead of requiring left-to-right or masking constraints.

3. Continuous-time Markov Chain (CTMC) Formulation

At the core of Edit Flows is a CTMC formulation where each state is a complete token sequence, and allowed transitions are those sequences differing by one edit operation. For any infinitesimal time interval $h$ :

$P(X_{t+h} = x | X_t = x_t) = \delta_{x_t}(x) + h u_t(x | x_t) + o(h)$

where $u_t(x | x_t)$ is the modeled edit rate, and $\delta_{x_t}$ is the Dirac delta on $x_t$ . Sequences are generated by traversing this (very large) edit graph over time, composing insertions, deletions, and substitutions.

This design allows:

Variable-length sequence generation, directly supporting growth and shrinkage during the generative process.
Position-relative transitions, so edits are conditioned on the position and current state, not rigid token ordering.
Padding-free modeling: No need for explicit padding or mask tokens, and no hard constraints on start/end position.
Non-factorized parallel evolution: All positions and operations are conditioned on the full current sequence.

4. Training Methodology

Training the CTMC over sequences is nontrivial due to the combinatorially large number of possible edit paths connecting source and target sequences. Edit Flows address this with an alignment-based expansion:

Auxiliary variables (alignments) are introduced to define a specific edit path between source $x_0$ and target $x_1$ , encoded as aligned sequences $z_0$ , $z_1$ possibly with padding or special tokens.
For each minibatch, a random alignment (edit sequence) between input-output pairs is sampled, used for loss computation but never exposed directly to the model at inference.
The training loss is a Bregman divergence family objective (a generalization of flow matching losses):

$\mathcal{L}(\theta) = \mathbb{E}_{ t,\, \pi(z_0, z_1),\, p_t(x_t, z_t | z_0, z_1) } \left[ \sum_{x \neq x_t} u_t^\theta(x | x_t) - \sum_{i=1}^N 1_{[z_1^i \neq z_t^i]} \frac{\dot{\kappa}_t}{1 - \kappa_t} \log u_t^\theta(x(z_t, i, z_1^i) | x_t) \right]$

The first term penalizes excessive "edit rates" (encouraging minimal edits), the second promotes correct operation-specific transitions according to the sampled alignment.

This methodology generalizes the standard flow matching loss to the discrete, edit-based, variable-length sequence setting. The loss remains tractable due to the alignment sampling and state-space expansion.

5. Experimental Evaluation

Edit Flows are evaluated on benchmark tasks spanning image captioning, open-ended language understanding, and code generation at multiple model scales. Results demonstrate robust improvements:

Image Captioning (MS-COCO, 280M parameters):

Model	METEOR	CIDEr	SPICE
Mask DFM	25.3	95.6	19.2
Autoregressive	25.7	95.5	19.6
Edit Flow	27.4	108.1	21.1
Localized Edit Flow	27.4	105.1	22.1

Edit Flows surpass both autoregressive and Mask DFM.

Open-ended Text (1.3B):

Model	HellaSwag	ARC-E	ARC-C	PIQA	OBQA	WinoGrande
Autoregressive	49.5	71.0	36.3	76.0	30.4	62.1
Mask DFM	38.3	55.4	27.8	65.3	22.6	52.3
Edit Flow	49.0	63.1	33.0	68.8	28.6	53.6
Edit Flow ( $\mathcal{L}$ )	54.5	61.0	34.0	65.0	37.2	54.3

Edit Flows compete closely with autoregressive models and substantially outperform mask-based models.

Code Generation (1.3B):

Model	HumanEval Pass@1	MBPP Pass@1
Autoregressive	17.0	25.6
Mask DFM	9.1	6.2
Localized Edit Flow	14.0	14.8
Edit Flow	12.8	10.0

Localized Edit Flow closes much of the performance gap for code with longer targets, demonstrating the model's applicability beyond mere short-form generation.

Other findings include efficient training (up to 3x faster per iteration than Mask DFM) and reduced overfitting to length heuristics due to position-relative generation.

6. Applications and Broader Implications

Edit Flows are broadly applicable to:

Text and code generation: Especially where length flexibility and position-specific insertions/deletions are valuable.
Image captioning: Achieve state-of-the-art results, illustrating utility for variable-length, semantically constrained tasks.
General sequence-to-sequence applications: Translation, summarization, dialog, and scenarios where edit-based alignment reflects natural task structure.

By modeling sequence evolution as a flexible edit path in CTMC space, Edit Flows provide a unified framework that subsumes prior models and affords new theoretical and empirical advantages. This approach lays a foundation for developing further discrete flows and non-factorized, edit-driven generative processes. The mathematical perspective also clarifies connections among autoregressive, substitution-only, and mask/diffusion models, highlighting the benefits of position-relative, operation-based modeling for natural language and code.

PDF Markdown Chat (Upgrade)