Papers
Topics
Authors
Recent
Search
2000 character limit reached

Single-Deletion Two-Substitution Channel

Updated 19 January 2026
  • Single-deletion two-substitution channel is a discrete error model where one symbol is deleted and up to two symbols are substituted in a transmitted sequence.
  • The analysis uses combinatorial partitioning to derive quantitative bounds on error ball intersections, revealing a quadratic scaling with sequence length.
  • Systematic code constructions leveraging BCH pre-codes and syndrome compression demonstrate practical approaches to balance redundancy and computational complexity.

The single-deletion two-substitution channel, denoted DS1,2_{1,2}, is a discrete channel model central to modern coding theory and sequence reconstruction. It encapsulates the scenario in which a transmitted sequence of length nn over a finite alphabet Σq\Sigma_q (q2q \geq 2) may incur exactly one deletion and up to two symbol substitutions during transmission. The formal properties of this channel, including its impact on code design and sequence reconstruction complexity, have been elucidated through recent advances in the asymptotic bounds on code redundancy, efficient encoding/decoding algorithms, and precise combinatorial analysis of error balls and their intersections (Song et al., 2020, Song et al., 12 Jan 2026).

1. Formal Channel Model

The DS1,2_{1,2} channel acts on an input xΣqnx \in \Sigma_q^n by deleting a single symbol at position j[n]j \in [n] and then substituting at most two symbols in the resulting (n1)(n-1)-length string. The set of all possible outputs forms the DS1,2_{1,2}-error ball: $B^{DS_{1,2}}(x) := \{\ y \in \Sigma_q^{n-1} \mid \text{%%%%9%%%% can be obtained from %%%%10%%%% by exactly one deletion and at most two substitutions} \ \}$ This construction captures the union over all nn0 of Hamming balls of radius 2 centered at nn1.

2. Intersection Properties and Sequence Reconstruction

A pivotal quantity for both code design and sequence reconstruction is the size of the intersection nn2 for nn3. This intersection governs the minimum number of independent reads required for perfect reconstruction (reconstruction threshold), known in the literature as "read-coverage." Specifically, if nn4 for code nn5, then any algorithm needs at least nn6 erroneous copies to guarantee recovery of the original (Song et al., 12 Jan 2026).

The exact upper bound for this intersection size, proved for all nn7 with Hamming distance nn8, is: nn9 with Σq\Sigma_q0 denoting a Σq\Sigma_q1-dependent constant independent of Σq\Sigma_q2. There exist pairs achieving this bound up to the additive constant for Σq\Sigma_q3 (Song et al., 12 Jan 2026).

These results underscore a qualitative distinction between mixed-error and pure-error channels: for DSΣq\Sigma_q4 the quadratic scaling of the worst-case intersection necessitates Σq\Sigma_q5 reads for perfect reconstruction, whereas single-deletion or single-substitution channels require only Σq\Sigma_q6 reads.

3. Combinatorial Structure and Analysis

A central aspect of recent work is the explicit combinatorial partitioning of the DSΣq\Sigma_q7 balls and their intersections. For Σq\Sigma_q8: Σq\Sigma_q9 where q2q \geq 20 is the two-substitution Hamming ball. The intersection structure is driven by counting how many deletion pairs q2q \geq 21 yield substrings of small mutual Hamming distance (at most 4) and analyzing their overlaps using run-decomposition and exact enumeration techniques. Key lemmas from (Song et al., 12 Jan 2026) provide closed forms for the intersection cardinalities, such as

q2q \geq 22

corresponding to Hamming ball intersections after a deletion.

4. Code Constructions: Systematic Binary Codes

The systematic binary code construction for single-deletion two-substitution correction given in (Song et al., 2020) achieves redundancy

q2q \geq 23

for codewords of length q2q \geq 24 and encoding/decoding complexity q2q \geq 25 and q2q \geq 26, respectively.

The construction proceeds in three main stages:

  1. BCH Pre-code: Map q2q \geq 27 to q2q \geq 28 using a systematic primitive BCH code of designed distance 5, yielding q2q \geq 29 redundancy.
  2. Syndrome Compression: Compute 1,2_{1,2}0 as the concatenation of an integer syndrome and a carefully selected modulus 1,2_{1,2}1, both represented in binary and protected against collisions among all single-deletion-two-substitution neighbors.
  3. Higher-Order Checks: Calculate a suite of five checksum-like quantities 1,2_{1,2}2 using cumulative weight vectors and moduli. Repeat each bit in the output 6-fold to withstand DS1,2_{1,2}3 channel errors, resulting in additional redundancy 1,2_{1,2}4.

The full systematic codeword is: 1,2_{1,2}5 Decoding robustness is ensured via sequential isolation and correction of the repeated checksum bits, syndrome part, and BCH-encoded part (through exhaustive candidates leveraging precomputed collisions guarantees).

5. Algorithmic Complexity and Correctness

The encoding algorithm executes in 1,2_{1,2}6 time: BCH encoding, syndrome compression (entailing brute-force search over possible moduli), and checksum computation. Decoding achieves 1,2_{1,2}7 by isolating error-affected substrings and resolving codeword candidates using correctness-inducing combinatorial properties (Song et al., 2020).

Correctness is anchored by structural lemmas:

  • f-protect Lemma: If two binary strings 1,2_{1,2}8 share non-empty 1,2_{1,2}9-ball intersection and all their higher-order xΣqnx \in \Sigma_q^n0-checks agree, then xΣqnx \in \Sigma_q^n1.
  • BCH Pre-code Lemma: Guarantees the systematic BCH component provides minimum distance 5 with small parity overhead.
  • Syndrome Compression Lemma: For every xΣqnx \in \Sigma_q^n2, there is a unique syndrome-modulus pair xΣqnx \in \Sigma_q^n3 among all codewords in its DSxΣqnx \in \Sigma_q^n4 ball.

6. Distinction from Pure-Error Channels and Implications

Unlike single-error or pure-substitution channels, the single-deletion two-substitution channel's quadratic ball intersection size directly influences the minimum number of reads for sequence reconstruction. While a single-deletion channel requires xΣqnx \in \Sigma_q^n5 reads for coverage, and a two-substitution channel needs xΣqnx \in \Sigma_q^n6 reads with a smaller linear term, the DSxΣqnx \in \Sigma_q^n7 channel matches the latter's scaling but with sharp constants, making code design and sequence assembly fundamentally "harder" (Song et al., 12 Jan 2026).

The interaction between deletion and substitution errors necessitates sophisticated code structures combining minimum distance, syndrome aggregation, and layered error detection/correction, as seen in the systematic codes above. The bounds and constructions provide both lower and upper bounds on achievable redundancy and computational tractability for practical code implementations in this error regime.

7. Outlook and Research Directions

The comprehensive characterizations of the error balls and intersection sizes for DSxΣqnx \in \Sigma_q^n8 channels anchor the design of minimal-redundancy codes and inform foundational limits for sequence reconstruction. The asymptotic optimality (up to constant terms) of known constructions, as well as explicit combinatorial formulas for mixed-error intersection sizes, enable precise benchmarking of future coding and assembly schemes. A plausible implication is that generalizations to more deletions or substitutions may follow similar structural analyses, but with increasingly intricate combinatorial geometry and likely higher redundancy and read-coverage lower bounds (Song et al., 2020, Song et al., 12 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single-Deletion Two-Substitution Channel.