Edit Flows: CTMC Sequence Editing Framework
- Edit Flows is a generation framework that models variable-length sequence editing as a continuous-time Markov chain using token-level insertions, deletions, and substitutions.
- It enables position-relative transitions to capture natural sequence variability, overcoming rigid fixed-length and autoregressive constraints.
- Empirical evaluations show that Edit Flows improve performance in tasks like image captioning, open-ended text, and code generation while offering faster training and reduced overfitting.
Edit Flows is a non-autoregressive sequence generation framework that models discrete sequence synthesis as a continuous-time Markov chain (CTMC) over sequences of variable length, using token-level edit operations: insertions, deletions, and substitutions. Unlike previous models that operate under rigid, token-wise, or fixed-length constraints, Edit Flows support position-relative, flexible generation mimicking the natural process of sequence editing. The framework was introduced to address the limitations of both autoregressive and discrete masking models in capturing the structure and variability inherent to sequence data, such as language, code, and captions.
1. Model Architecture
Edit Flows parameterizes the generative process as a CTMC over the space of all possible token sequences up to length . At each time , the state is a sequence ; outgoing transitions correspond to all possible single-token edits. For every such edit, the model predicts a non-negative rate . The network outputs rates for all possible insertions, deletions, and substitutions at all positions for the current sequence.
For a sequence of length :
$\begin{align*} u^{\theta}_t((x, i, a) | x) &= \lambda^{(x)}_{t,i} Q^{(a|x)}_{t,i} & \text{(Insertion at %%%%6%%%%, token %%%%7%%%%)} \ u^{\theta}_t((x, i) | x) &= \lambda^{(x)}_{t,i} & \text{(Deletion at %%%%8%%%%)} \ u^{\theta}_t((x, i, a) | x) &= \lambda^{(x)}_{t,i} Q^{(a|x)}_{t,i} & \text{(Substitution at %%%%9%%%%, token %%%%10%%%%)} \end{align*}$
where are the rates and are position-conditioned distributions over the vocabulary.
Sequence generation thus proceeds from an initial sequence (possibly empty or random), applying position-relative edit operations over continuous time.
2. Edit Operations: Insertions, Deletions, and Substitutions
Edit Flows enables token-level modification via three operations:
- Insertion: Addition of a token at position in , producing .
- Deletion: Removal of the token at position in , yielding .
- Substitution: Replacement of the token at with , denoted .
Transitions are defined only between sequences differing by a single edit operation, ensuring paths correspond to edit-based evolution rather than mask-only or direct replacement. Insertions and substitutions share a joint rate-parameterization: a rate for each position and a categorical distribution over possible tokens at that position, facilitating efficient sampling even for large vocabularies. Outflow constraints in the CTMC ensure that the total rate out of a state is balanced.
At simulation time, the current sequence is advanced by randomly sampling possible edit operations according to their rates, and applying the sampled edits—parallelizing over the sequence instead of requiring left-to-right or masking constraints.
3. Continuous-time Markov Chain (CTMC) Formulation
At the core of Edit Flows is a CTMC formulation where each state is a complete token sequence, and allowed transitions are those sequences differing by one edit operation. For any infinitesimal time interval :
where is the modeled edit rate, and is the Dirac delta on . Sequences are generated by traversing this (very large) edit graph over time, composing insertions, deletions, and substitutions.
This design allows:
- Variable-length sequence generation, directly supporting growth and shrinkage during the generative process.
- Position-relative transitions, so edits are conditioned on the position and current state, not rigid token ordering.
- Padding-free modeling: No need for explicit padding or mask tokens, and no hard constraints on start/end position.
- Non-factorized parallel evolution: All positions and operations are conditioned on the full current sequence.
4. Training Methodology
Training the CTMC over sequences is nontrivial due to the combinatorially large number of possible edit paths connecting source and target sequences. Edit Flows address this with an alignment-based expansion:
- Auxiliary variables (alignments) are introduced to define a specific edit path between source and target , encoded as aligned sequences , possibly with padding or special tokens.
- For each minibatch, a random alignment (edit sequence) between input-output pairs is sampled, used for loss computation but never exposed directly to the model at inference.
- The training loss is a Bregman divergence family objective (a generalization of flow matching losses):
The first term penalizes excessive "edit rates" (encouraging minimal edits), the second promotes correct operation-specific transitions according to the sampled alignment.
This methodology generalizes the standard flow matching loss to the discrete, edit-based, variable-length sequence setting. The loss remains tractable due to the alignment sampling and state-space expansion.
5. Experimental Evaluation
Edit Flows are evaluated on benchmark tasks spanning image captioning, open-ended language understanding, and code generation at multiple model scales. Results demonstrate robust improvements:
- Image Captioning (MS-COCO, 280M parameters):
Model | METEOR | CIDEr | SPICE |
---|---|---|---|
Mask DFM | 25.3 | 95.6 | 19.2 |
Autoregressive | 25.7 | 95.5 | 19.6 |
Edit Flow | 27.4 | 108.1 | 21.1 |
Localized Edit Flow | 27.4 | 105.1 | 22.1 |
Edit Flows surpass both autoregressive and Mask DFM.
- Open-ended Text (1.3B):
Model | HellaSwag | ARC-E | ARC-C | PIQA | OBQA | WinoGrande |
---|---|---|---|---|---|---|
Autoregressive | 49.5 | 71.0 | 36.3 | 76.0 | 30.4 | 62.1 |
Mask DFM | 38.3 | 55.4 | 27.8 | 65.3 | 22.6 | 52.3 |
Edit Flow | 49.0 | 63.1 | 33.0 | 68.8 | 28.6 | 53.6 |
Edit Flow () | 54.5 | 61.0 | 34.0 | 65.0 | 37.2 | 54.3 |
Edit Flows compete closely with autoregressive models and substantially outperform mask-based models.
- Code Generation (1.3B):
Model | HumanEval Pass@1 | MBPP Pass@1 |
---|---|---|
Autoregressive | 17.0 | 25.6 |
Mask DFM | 9.1 | 6.2 |
Localized Edit Flow | 14.0 | 14.8 |
Edit Flow | 12.8 | 10.0 |
Localized Edit Flow closes much of the performance gap for code with longer targets, demonstrating the model's applicability beyond mere short-form generation.
Other findings include efficient training (up to 3x faster per iteration than Mask DFM) and reduced overfitting to length heuristics due to position-relative generation.
6. Applications and Broader Implications
Edit Flows are broadly applicable to:
- Text and code generation: Especially where length flexibility and position-specific insertions/deletions are valuable.
- Image captioning: Achieve state-of-the-art results, illustrating utility for variable-length, semantically constrained tasks.
- General sequence-to-sequence applications: Translation, summarization, dialog, and scenarios where edit-based alignment reflects natural task structure.
By modeling sequence evolution as a flexible edit path in CTMC space, Edit Flows provide a unified framework that subsumes prior models and affords new theoretical and empirical advantages. This approach lays a foundation for developing further discrete flows and non-factorized, edit-driven generative processes. The mathematical perspective also clarifies connections among autoregressive, substitution-only, and mask/diffusion models, highlighting the benefits of position-relative, operation-based modeling for natural language and code.