Multi-Deletion Correction Codes

Updated 14 December 2025

Multi-deletion correction codes are error-correcting schemes that recover sequences with multiple arbitrary deletions while addressing challenging synchronization errors.
They employ algebraic models, VT codes, multiplicity-free constructions, and permutation-based methods to achieve near-optimal redundancy and efficient decoding.
Recent advances include explicit constructions, refined combinatorial bounds, and quantum analogs with applications in genomics, network synchronization, and file alignment.

A multi-deletion correction code is an error-correcting code designed to uniquely recover codewords from sequences that have undergone multiple arbitrary deletions. Whereas classical codes address substitution errors, multi-deletion codes confront one of the most challenging types of synchronization errors where positions and values of deleted symbols are unknown. This problem is fundamental in information theory, with connections to genomic data, network synchronization, and file alignment protocols. Multi-deletion correction codes span binary, non-binary, and permutation-based constructions, and recent progress involves nearly optimal explicit constructions, tight combinatorial bounds, advanced decoding algorithms, and quantum analogs. This article surveys the main families of codes, their underlying frameworks, decoding methods, theoretical bounds, and practical advances.

1. Foundational Principles and Main Models

Multi-deletion correction considers words $x$ over an alphabet $\Sigma$ where up to $t$ symbols can be deleted, producing an (unknown) subsequence $y$ . The classical objective is to construct codes $C \subset \Sigma^n$ such that for any $x \ne x' \in C$ , no sequence $y$ can be a subsequence of both after up to $t$ deletions. By Levenshtein’s equivalence, codes that correct $t$ deletions can also correct any mixture of up to $t$ insertions and deletions.

Algebraic Models

Single Deletion (VT Codes): Varshamov–Tenengolts–Levenshtein codes use weighted sum congruences; redundancy $\log n+O(1)$ is optimal for $t = 1$ .
Multiple Deletion (General Codes): For $t \ge 2$ , the problem is far more complex. Existentially, random greedy selection shows that redundancy $t\log n+O(1)$ suffices, but until recently, explicit codes did not match this bound.

Non-Binary, Multiplicity-Free Codes

Recent non-binary constructions focus on codes over alphabets $\Sigma_q$ with $q > n$ , especially multiplicity-free words—each symbol appears exactly once. The design involves splitting each codeword into its unordered set and its induced permutation, enabling modular correction of deletions on sets and permutations (Schaller et al., 23 Jan 2025).

Permutation Codes

In permutation codes, the codewords are permutations $\pi \in S_n$ . The Ulam metric, defined by $n-\mathrm{LCS}(\pi, \sigma)$ , governs deletion-correctability. Codes correcting $t$ deletions must have minimum Ulam distance $t+1$ (Wang et al., 2024).

2. Explicit Constructions and Decoding Algorithms

Multiplicity-Free Set-Permutation Construction

For $q > n$ , every multiplicity-free word is mapped bijectively to:

Induced Set $A(x) \in (\Sigma_q \text{ choose } n)$ ;
Induced Permutation $\sigma \in S_n$ .

Codes are built by combining a constant-weight set code $C_S(q, n, t)$ (corrects up to $t$ asymmetric deletions as 1→0 errors) and a permutation-deletion code $C_{SD}(n, t)$ (corrects $t$ stable deletions). The code $C_2(q,n,t)$ is defined as the inverse image of $C_S \times C_{SD}$ under the decomposition bijection (Schaller et al., 23 Jan 2025).

Decoding (Pseudocode)

Recover $A(x)$ via indicator decoding over $\Sigma_q$ .
Sort $A(x)$ lexicographically and trace observed positions to symbol indices to derive the shortened permutation.
Decode the permutation using a stable deletion-correcting decoder for $C_{SD}$ .
Reassemble using inverse bijection.

The complexity is $\mathcal O(q \cdot \mathrm{poly}(t, \log q))$ for set decoding, polynomial in $n$ for permutation decoding via known subroutines.

Permutation Deletion Codes via Hamming Mapping

An injective mapping $f$ converts permutation errors into Hamming errors. For $\pi \in S_n$ , augment to $(\pi_1, ..., \pi_n, n+1)$ and apply $f$ such that Hamming errors correspond to up to $3t$ translocations for $t$ deletions. By intersecting this image with Hamming-metric codes of distance $3t+1$, one gets codes of size $\ge n!/(2n)^{3t-1}$ that correct $t$ deletions. Decoding leverages erasure and error-correcting codes in the Hamming metric (Wang et al., 2024).

Burst Deletion and Generalizations

Burst-deletion codes correct runs of consecutive deletions. Constructions interleave array-based codes and shifted VT (SVT) codes, achieving $\log n + (b-1)\log \log n + b - \log b$ redundancy for correcting bursts of length $b$ —provably near optimal (Schoeny et al., 2016). Extensions address non-consecutive bursts and mixed insertions/deletions.

3. Combinatorial and Asymptotic Bounds

Sphere-Packing and Hypergraph Methods

General Codes: The maximum code size for correcting $t$ deletions is bounded by fractional matching in deletion-derived hypergraphs (Kulkarni et al., 2012).
Levenshtein Bound and Improvements: For general $q$ -ary alphabets, classical Levenshtein bounds are $q^n/[(q-1)^t {n \choose t}]$ . A refinement via mixed packing (allowing insertions and deletions) yields strictly better bounds when $t > q$ (Cullina et al., 2013):

$A_q(n, t) \lesssim \min_{0 \le b \le t} \dfrac{q^{n+b}}{(q-1)^t \binom{n}{t} \binom{t}{b}}$

Singleton Bound and Rate Analysis

A code correcting $t$ deletions must have $r \ge t\log q$ . The multiplicity-free set-permutation construction achieves redundancy $r \le t\log q + (3t-1)\log n + O(t)$ , which is asymptotically optimal as $q \gg n$ (Schaller et al., 23 Jan 2025).

Existential and Explicit Constructions in Binary Case

Greedy existential codes achieve $\Omega(2^n/n^t)$ code size for binary codes correcting $t$ deletions.
Explicit constructions, particularly the Guruswami–Håstad augmented VT codes, match redundancy $t\log n + O(\log \log n)$ for $t = 2$ (Guruswami et al., 2020, Sun et al., 2024).

4. Quantum Multi-Deletion Correction

Quantum analogs for multi-deletion codes address deletion of $t$ qubits (tracing out unknown positions). Two systematic methods are established:

Reed–Solomon-Based Quantum Deletion Codes

Alternating Sandwich Mapping: Interleave RS code blocks with marker qubits (blocks of $|0\rangle^t$ and $|1\rangle^t$ ), transforming deletion correction into erasure correction.
Error Locator Algorithm: Precise block alignment and marker measurement allows detection of erased blocks; standard quantum RS erasure decoding completes recovery.
Achieves rates arbitrarily close to the RS code’s rate for any fixed $t$ , does not require prior knowledge of deletion count (Hagiwara, 2023).

Marker-Periodicity and Stabilizer Conversion

Any $t$ -erasure-correcting quantum code can be lifted, via periodic marker-state prefixing, to a $t$ -deletion-correcting code over alphabet of size $t+1$ . The achievable rate scales by $1/(t+1)$ compared to the base erasure code (Matsumoto et al., 2021).

5. Array-Based and Structured Multi-Deletion Codes

Criss-Cross Codes for Arrays

For $n \times n$ arrays, correcting up to $t$ deletions spread arbitrarily across rows and columns (criss-cross model) requires intersection of systematic multi-deletion codes and rank-metric Gabidulin codes. Redundancy is lower bounded by $tn + t\log n - \log t!$ , and explicit codes attain $tn+O(t^2\log^2 n)$ (Welter et al., 2021).

Helberg-Type and Non-Binary Constructions

Number-theoretic codes generalize Helberg’s binary construction to non-binary alphabets: using moments based on an exponentially growing weight sequence and a congruence condition modulo a carefully chosen modulus, up to $d$ deletions are correctable. Decoding algorithms work in $O(d n)$ for fixed $d$ ; however, redundancy scales linearly with $n$ (Le et al., 2015, Segrest et al., 26 Aug 2025).

6. List-Decoding, Edit Channels, and Limitations

Recent advances include list-decodable multi-deletion codes (outputting short lists containing the true codeword), and codes for general edit channels (insertions, deletions, substitutions). For example, a code correcting two edits achieves redundancy $6\log n + O(\log\log n)$ by intersecting two-deletion, two-substitution, and one-deletion-one-substitution constraints, surpassing previous constructions (Sun et al., 2024).

7. Comparative Summary and Open Problems

Construction/Bound	Redundancy	Alphabet/Structure	Complexity	Reference
VT code ( $t=1$ deletion)	$\log n+O(1)$	Binary/Arbitrary	Linear	(Kulkarni et al., 2012)
Guruswami–Håstad explicit ( $t=2$ )	$4\log n + O(\log\log n)$	Binary	Poly(n)	(Guruswami et al., 2020)
Multiplicity-free perm+set code	$t\log q + (3t-1)\log n$	Non-binary ( $q>n$ )	$O(q\cdot\mathrm{poly}(t,\log q))$	(Schaller et al., 23 Jan 2025)
Permutation-t-deletion codes	$(3t-1)\log n+o(\log n)$	Permutations	Polynomial	(Wang et al., 2024)
Guess & Check (zero-error, high-prob)	$\Theta(t\log n)$	Binary	Polynomial	(Hanna et al., 2017)
Quantum Reed-Solomon construction	Flexible, $k/n \rightarrow \gamma$	Qudits	Efficient, no prior $t$	(Hagiwara, 2023)

Despite major advances, capacity-achieving explicit codes for arbitrary $t$ remain an open challenge in both binary and non-binary settings. For small $t$ , best-known explicit codes match existential bounds up to $O(\log \log n)$ terms; for large weight or array codes, redundancy is still linear in $n$ . Permutation codes currently define the best explicit tradeoffs, and quantum deletion correction is an active area leveraging erasure conversion. Future directions include reducing the constant factors in redundancy, developing faster encoding algorithms, and extending combinatorial bounds to complex constrained sources.