Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Deletion Correction Codes

Updated 14 December 2025
  • Multi-deletion correction codes are error-correcting schemes that recover sequences with multiple arbitrary deletions while addressing challenging synchronization errors.
  • They employ algebraic models, VT codes, multiplicity-free constructions, and permutation-based methods to achieve near-optimal redundancy and efficient decoding.
  • Recent advances include explicit constructions, refined combinatorial bounds, and quantum analogs with applications in genomics, network synchronization, and file alignment.

A multi-deletion correction code is an error-correcting code designed to uniquely recover codewords from sequences that have undergone multiple arbitrary deletions. Whereas classical codes address substitution errors, multi-deletion codes confront one of the most challenging types of synchronization errors where positions and values of deleted symbols are unknown. This problem is fundamental in information theory, with connections to genomic data, network synchronization, and file alignment protocols. Multi-deletion correction codes span binary, non-binary, and permutation-based constructions, and recent progress involves nearly optimal explicit constructions, tight combinatorial bounds, advanced decoding algorithms, and quantum analogs. This article surveys the main families of codes, their underlying frameworks, decoding methods, theoretical bounds, and practical advances.

1. Foundational Principles and Main Models

Multi-deletion correction considers words xx over an alphabet Σ\Sigma where up to tt symbols can be deleted, producing an (unknown) subsequence yy. The classical objective is to construct codes CΣnC \subset \Sigma^n such that for any xxCx \ne x' \in C, no sequence yy can be a subsequence of both after up to tt deletions. By Levenshtein’s equivalence, codes that correct tt deletions can also correct any mixture of up to tt insertions and deletions.

Algebraic Models

  • Single Deletion (VT Codes): Varshamov–Tenengolts–Levenshtein codes use weighted sum congruences; redundancy logn+O(1)\log n+O(1) is optimal for t=1t = 1.
  • Multiple Deletion (General Codes): For t2t \ge 2, the problem is far more complex. Existentially, random greedy selection shows that redundancy tlogn+O(1)t\log n+O(1) suffices, but until recently, explicit codes did not match this bound.

Non-Binary, Multiplicity-Free Codes

Recent non-binary constructions focus on codes over alphabets Σq\Sigma_q with q>nq > n, especially multiplicity-free words—each symbol appears exactly once. The design involves splitting each codeword into its unordered set and its induced permutation, enabling modular correction of deletions on sets and permutations (Schaller et al., 23 Jan 2025).

Permutation Codes

In permutation codes, the codewords are permutations πSn\pi \in S_n. The Ulam metric, defined by nLCS(π,σ)n-\mathrm{LCS}(\pi, \sigma), governs deletion-correctability. Codes correcting tt deletions must have minimum Ulam distance t+1t+1 (Wang et al., 2024).

2. Explicit Constructions and Decoding Algorithms

Multiplicity-Free Set-Permutation Construction

For q>nq > n, every multiplicity-free word is mapped bijectively to:

  1. Induced Set A(x)(Σq choose n)A(x) \in (\Sigma_q \text{ choose } n);
  2. Induced Permutation σSn\sigma \in S_n.

Codes are built by combining a constant-weight set code CS(q,n,t)C_S(q, n, t) (corrects up to tt asymmetric deletions as 1→0 errors) and a permutation-deletion code CSD(n,t)C_{SD}(n, t) (corrects tt stable deletions). The code C2(q,n,t)C_2(q,n,t) is defined as the inverse image of CS×CSDC_S \times C_{SD} under the decomposition bijection (Schaller et al., 23 Jan 2025).

Decoding (Pseudocode)

  1. Recover A(x)A(x) via indicator decoding over Σq\Sigma_q.
  2. Sort A(x)A(x) lexicographically and trace observed positions to symbol indices to derive the shortened permutation.
  3. Decode the permutation using a stable deletion-correcting decoder for CSDC_{SD}.
  4. Reassemble using inverse bijection.

The complexity is O(qpoly(t,logq))\mathcal O(q \cdot \mathrm{poly}(t, \log q)) for set decoding, polynomial in nn for permutation decoding via known subroutines.

Permutation Deletion Codes via Hamming Mapping

An injective mapping ff converts permutation errors into Hamming errors. For πSn\pi \in S_n, augment to (π1,...,πn,n+1)(\pi_1, ..., \pi_n, n+1) and apply ff such that Hamming errors correspond to up to $3t$ translocations for tt deletions. By intersecting this image with Hamming-metric codes of distance $3t+1$, one gets codes of size n!/(2n)3t1\ge n!/(2n)^{3t-1} that correct tt deletions. Decoding leverages erasure and error-correcting codes in the Hamming metric (Wang et al., 2024).

Burst Deletion and Generalizations

Burst-deletion codes correct runs of consecutive deletions. Constructions interleave array-based codes and shifted VT (SVT) codes, achieving logn+(b1)loglogn+blogb\log n + (b-1)\log \log n + b - \log b redundancy for correcting bursts of length bb—provably near optimal (Schoeny et al., 2016). Extensions address non-consecutive bursts and mixed insertions/deletions.

3. Combinatorial and Asymptotic Bounds

Sphere-Packing and Hypergraph Methods

  • General Codes: The maximum code size for correcting tt deletions is bounded by fractional matching in deletion-derived hypergraphs (Kulkarni et al., 2012).
  • Levenshtein Bound and Improvements: For general qq-ary alphabets, classical Levenshtein bounds are qn/[(q1)t(nt)]q^n/[(q-1)^t {n \choose t}]. A refinement via mixed packing (allowing insertions and deletions) yields strictly better bounds when t>qt > q (Cullina et al., 2013):

Aq(n,t)min0btqn+b(q1)t(nt)(tb)A_q(n, t) \lesssim \min_{0 \le b \le t} \dfrac{q^{n+b}}{(q-1)^t \binom{n}{t} \binom{t}{b}}

Singleton Bound and Rate Analysis

A code correcting tt deletions must have rtlogqr \ge t\log q. The multiplicity-free set-permutation construction achieves redundancy rtlogq+(3t1)logn+O(t)r \le t\log q + (3t-1)\log n + O(t), which is asymptotically optimal as qnq \gg n (Schaller et al., 23 Jan 2025).

Existential and Explicit Constructions in Binary Case

  • Greedy existential codes achieve Ω(2n/nt)\Omega(2^n/n^t) code size for binary codes correcting tt deletions.
  • Explicit constructions, particularly the Guruswami–Håstad augmented VT codes, match redundancy tlogn+O(loglogn)t\log n + O(\log \log n) for t=2t = 2 (Guruswami et al., 2020, Sun et al., 2024).

4. Quantum Multi-Deletion Correction

Quantum analogs for multi-deletion codes address deletion of tt qubits (tracing out unknown positions). Two systematic methods are established:

Reed–Solomon-Based Quantum Deletion Codes

  • Alternating Sandwich Mapping: Interleave RS code blocks with marker qubits (blocks of 0t|0\rangle^t and 1t|1\rangle^t), transforming deletion correction into erasure correction.
  • Error Locator Algorithm: Precise block alignment and marker measurement allows detection of erased blocks; standard quantum RS erasure decoding completes recovery.
  • Achieves rates arbitrarily close to the RS code’s rate for any fixed tt, does not require prior knowledge of deletion count (Hagiwara, 2023).

Marker-Periodicity and Stabilizer Conversion

Any tt-erasure-correcting quantum code can be lifted, via periodic marker-state prefixing, to a tt-deletion-correcting code over alphabet of size t+1t+1. The achievable rate scales by $1/(t+1)$ compared to the base erasure code (Matsumoto et al., 2021).

5. Array-Based and Structured Multi-Deletion Codes

Criss-Cross Codes for Arrays

For n×nn \times n arrays, correcting up to tt deletions spread arbitrarily across rows and columns (criss-cross model) requires intersection of systematic multi-deletion codes and rank-metric Gabidulin codes. Redundancy is lower bounded by tn+tlognlogt!tn + t\log n - \log t!, and explicit codes attain tn+O(t2log2n)tn+O(t^2\log^2 n) (Welter et al., 2021).

Helberg-Type and Non-Binary Constructions

Number-theoretic codes generalize Helberg’s binary construction to non-binary alphabets: using moments based on an exponentially growing weight sequence and a congruence condition modulo a carefully chosen modulus, up to dd deletions are correctable. Decoding algorithms work in O(dn)O(d n) for fixed dd; however, redundancy scales linearly with nn (Le et al., 2015, Segrest et al., 26 Aug 2025).

6. List-Decoding, Edit Channels, and Limitations

Recent advances include list-decodable multi-deletion codes (outputting short lists containing the true codeword), and codes for general edit channels (insertions, deletions, substitutions). For example, a code correcting two edits achieves redundancy 6logn+O(loglogn)6\log n + O(\log\log n) by intersecting two-deletion, two-substitution, and one-deletion-one-substitution constraints, surpassing previous constructions (Sun et al., 2024).

7. Comparative Summary and Open Problems

Construction/Bound Redundancy Alphabet/Structure Complexity Reference
VT code (t=1t=1 deletion) logn+O(1)\log n+O(1) Binary/Arbitrary Linear (Kulkarni et al., 2012)
Guruswami–Håstad explicit (t=2t=2) 4logn+O(loglogn)4\log n + O(\log\log n) Binary Poly(n) (Guruswami et al., 2020)
Multiplicity-free perm+set code tlogq+(3t1)lognt\log q + (3t-1)\log n Non-binary (q>nq>n) O(qpoly(t,logq))O(q\cdot\mathrm{poly}(t,\log q)) (Schaller et al., 23 Jan 2025)
Permutation-t-deletion codes (3t1)logn+o(logn)(3t-1)\log n+o(\log n) Permutations Polynomial (Wang et al., 2024)
Guess & Check (zero-error, high-prob) Θ(tlogn)\Theta(t\log n) Binary Polynomial (Hanna et al., 2017)
Quantum Reed-Solomon construction Flexible, k/nγk/n \rightarrow \gamma Qudits Efficient, no prior tt (Hagiwara, 2023)

Despite major advances, capacity-achieving explicit codes for arbitrary tt remain an open challenge in both binary and non-binary settings. For small tt, best-known explicit codes match existential bounds up to O(loglogn)O(\log \log n) terms; for large weight or array codes, redundancy is still linear in nn. Permutation codes currently define the best explicit tradeoffs, and quantum deletion correction is an active area leveraging erasure conversion. Future directions include reducing the constant factors in redundancy, developing faster encoding algorithms, and extending combinatorial bounds to complex constrained sources.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Deletion Correction Codes.