Multi-Deletion Correction Codes
- Multi-deletion correction codes are error-correcting schemes that recover sequences with multiple arbitrary deletions while addressing challenging synchronization errors.
- They employ algebraic models, VT codes, multiplicity-free constructions, and permutation-based methods to achieve near-optimal redundancy and efficient decoding.
- Recent advances include explicit constructions, refined combinatorial bounds, and quantum analogs with applications in genomics, network synchronization, and file alignment.
A multi-deletion correction code is an error-correcting code designed to uniquely recover codewords from sequences that have undergone multiple arbitrary deletions. Whereas classical codes address substitution errors, multi-deletion codes confront one of the most challenging types of synchronization errors where positions and values of deleted symbols are unknown. This problem is fundamental in information theory, with connections to genomic data, network synchronization, and file alignment protocols. Multi-deletion correction codes span binary, non-binary, and permutation-based constructions, and recent progress involves nearly optimal explicit constructions, tight combinatorial bounds, advanced decoding algorithms, and quantum analogs. This article surveys the main families of codes, their underlying frameworks, decoding methods, theoretical bounds, and practical advances.
1. Foundational Principles and Main Models
Multi-deletion correction considers words over an alphabet where up to symbols can be deleted, producing an (unknown) subsequence . The classical objective is to construct codes such that for any , no sequence can be a subsequence of both after up to deletions. By Levenshtein’s equivalence, codes that correct deletions can also correct any mixture of up to insertions and deletions.
Algebraic Models
- Single Deletion (VT Codes): Varshamov–Tenengolts–Levenshtein codes use weighted sum congruences; redundancy is optimal for .
- Multiple Deletion (General Codes): For , the problem is far more complex. Existentially, random greedy selection shows that redundancy suffices, but until recently, explicit codes did not match this bound.
Non-Binary, Multiplicity-Free Codes
Recent non-binary constructions focus on codes over alphabets with , especially multiplicity-free words—each symbol appears exactly once. The design involves splitting each codeword into its unordered set and its induced permutation, enabling modular correction of deletions on sets and permutations (Schaller et al., 23 Jan 2025).
Permutation Codes
In permutation codes, the codewords are permutations . The Ulam metric, defined by , governs deletion-correctability. Codes correcting deletions must have minimum Ulam distance (Wang et al., 2024).
2. Explicit Constructions and Decoding Algorithms
Multiplicity-Free Set-Permutation Construction
For , every multiplicity-free word is mapped bijectively to:
- Induced Set ;
- Induced Permutation .
Codes are built by combining a constant-weight set code (corrects up to asymmetric deletions as 1→0 errors) and a permutation-deletion code (corrects stable deletions). The code is defined as the inverse image of under the decomposition bijection (Schaller et al., 23 Jan 2025).
Decoding (Pseudocode)
- Recover via indicator decoding over .
- Sort lexicographically and trace observed positions to symbol indices to derive the shortened permutation.
- Decode the permutation using a stable deletion-correcting decoder for .
- Reassemble using inverse bijection.
The complexity is for set decoding, polynomial in for permutation decoding via known subroutines.
Permutation Deletion Codes via Hamming Mapping
An injective mapping converts permutation errors into Hamming errors. For , augment to and apply such that Hamming errors correspond to up to $3t$ translocations for deletions. By intersecting this image with Hamming-metric codes of distance $3t+1$, one gets codes of size that correct deletions. Decoding leverages erasure and error-correcting codes in the Hamming metric (Wang et al., 2024).
Burst Deletion and Generalizations
Burst-deletion codes correct runs of consecutive deletions. Constructions interleave array-based codes and shifted VT (SVT) codes, achieving redundancy for correcting bursts of length —provably near optimal (Schoeny et al., 2016). Extensions address non-consecutive bursts and mixed insertions/deletions.
3. Combinatorial and Asymptotic Bounds
Sphere-Packing and Hypergraph Methods
- General Codes: The maximum code size for correcting deletions is bounded by fractional matching in deletion-derived hypergraphs (Kulkarni et al., 2012).
- Levenshtein Bound and Improvements: For general -ary alphabets, classical Levenshtein bounds are . A refinement via mixed packing (allowing insertions and deletions) yields strictly better bounds when (Cullina et al., 2013):
Singleton Bound and Rate Analysis
A code correcting deletions must have . The multiplicity-free set-permutation construction achieves redundancy , which is asymptotically optimal as (Schaller et al., 23 Jan 2025).
Existential and Explicit Constructions in Binary Case
- Greedy existential codes achieve code size for binary codes correcting deletions.
- Explicit constructions, particularly the Guruswami–Håstad augmented VT codes, match redundancy for (Guruswami et al., 2020, Sun et al., 2024).
4. Quantum Multi-Deletion Correction
Quantum analogs for multi-deletion codes address deletion of qubits (tracing out unknown positions). Two systematic methods are established:
Reed–Solomon-Based Quantum Deletion Codes
- Alternating Sandwich Mapping: Interleave RS code blocks with marker qubits (blocks of and ), transforming deletion correction into erasure correction.
- Error Locator Algorithm: Precise block alignment and marker measurement allows detection of erased blocks; standard quantum RS erasure decoding completes recovery.
- Achieves rates arbitrarily close to the RS code’s rate for any fixed , does not require prior knowledge of deletion count (Hagiwara, 2023).
Marker-Periodicity and Stabilizer Conversion
Any -erasure-correcting quantum code can be lifted, via periodic marker-state prefixing, to a -deletion-correcting code over alphabet of size . The achievable rate scales by $1/(t+1)$ compared to the base erasure code (Matsumoto et al., 2021).
5. Array-Based and Structured Multi-Deletion Codes
Criss-Cross Codes for Arrays
For arrays, correcting up to deletions spread arbitrarily across rows and columns (criss-cross model) requires intersection of systematic multi-deletion codes and rank-metric Gabidulin codes. Redundancy is lower bounded by , and explicit codes attain (Welter et al., 2021).
Helberg-Type and Non-Binary Constructions
Number-theoretic codes generalize Helberg’s binary construction to non-binary alphabets: using moments based on an exponentially growing weight sequence and a congruence condition modulo a carefully chosen modulus, up to deletions are correctable. Decoding algorithms work in for fixed ; however, redundancy scales linearly with (Le et al., 2015, Segrest et al., 26 Aug 2025).
6. List-Decoding, Edit Channels, and Limitations
Recent advances include list-decodable multi-deletion codes (outputting short lists containing the true codeword), and codes for general edit channels (insertions, deletions, substitutions). For example, a code correcting two edits achieves redundancy by intersecting two-deletion, two-substitution, and one-deletion-one-substitution constraints, surpassing previous constructions (Sun et al., 2024).
7. Comparative Summary and Open Problems
| Construction/Bound | Redundancy | Alphabet/Structure | Complexity | Reference |
|---|---|---|---|---|
| VT code ( deletion) | Binary/Arbitrary | Linear | (Kulkarni et al., 2012) | |
| Guruswami–Håstad explicit () | Binary | Poly(n) | (Guruswami et al., 2020) | |
| Multiplicity-free perm+set code | Non-binary () | (Schaller et al., 23 Jan 2025) | ||
| Permutation-t-deletion codes | Permutations | Polynomial | (Wang et al., 2024) | |
| Guess & Check (zero-error, high-prob) | Binary | Polynomial | (Hanna et al., 2017) | |
| Quantum Reed-Solomon construction | Flexible, | Qudits | Efficient, no prior | (Hagiwara, 2023) |
Despite major advances, capacity-achieving explicit codes for arbitrary remain an open challenge in both binary and non-binary settings. For small , best-known explicit codes match existential bounds up to terms; for large weight or array codes, redundancy is still linear in . Permutation codes currently define the best explicit tradeoffs, and quantum deletion correction is an active area leveraging erasure conversion. Future directions include reducing the constant factors in redundancy, developing faster encoding algorithms, and extending combinatorial bounds to complex constrained sources.