Papers
Topics
Authors
Recent
Search
2000 character limit reached

Extremal Deletion Regime

Updated 16 January 2026
  • Extremal deletion regime is a framework analyzing limits where nearly all symbols are deleted, defining key thresholds for channel capacity and code resilience.
  • It examines combinatorial structures and probabilistic models that shape the behavior of codes under near-complete deletion scenarios.
  • Explicit constructions, such as Reed–Solomon and cyclic Sidon-type codes, highlight trade-offs between redundancy and decoding performance in high-error settings.

The extremal deletion regime encompasses the set of theoretical and coding-theoretic limits, combinatorial structures, and algorithmic constructions that arise when the fraction or absolute number of deletions in a channel approaches the most severe or “extremal” value permitted—often near the information-theoretic maximum, e.g., where only a vanishing number of received symbols remain or the permissible deletion rate approaches 1. This regime is central in deletion channel coding theory, combinatorics of subsequences, and probabilistic or adversarial analysis, with direct implications for code constructions, bounds on capacity, and the design of robust systems in high-error environments.

1. Formal Definition and Foundational Regimes

In discrete channel models, the extremal deletion regime refers to parameter settings where either:

  • the number of deletions tt is maximal or near-maximal (e.g., t=nkt=n-k for fixed small kk and blocklength nn), or
  • the deletion probability pp (or fraction δ\delta) approaches 1, making the number of surviving symbols negligible.

Specific settings include:

  • Multiset/Unordered channels: t=nkt=n-k deletions, so only a constant (or small) number kk of symbols survive (Kreindel et al., 9 Jan 2026).
  • Binary/Sequence channels: δ1\delta\to1 (fraction of deletions tends to 1), or to the combinatorial threshold where positive-rate codes cease to exist (Guruswami et al., 2014, Guruswami et al., 2021).
  • Capacity results: pd,ps0p_d, p_s\to0 in the deletion/substitution channel allows for an “extremal small-error” expansion (Kazemi et al., 4 Mar 2025), while p1/2p\to1/2 in the adversarial case marks the zero-rate threshold (Guruswami et al., 2021).
  • Probabilistic/random process models: Regimes where deletion dominates growth, so typical size or structure approaches minimal values (Saunders, 2019).

The “extremal deletion regime” thus characterizes the combinatorial, probabilistic, and information-theoretic behavior of codes and structures at or near the points where deletion errors are nearly maximized.

2. Capacity and Combinatorial Bounds in the Extremal Regime

Binary Deletion Channel and Substitution Extensions

For i.i.d. deletion and substitution channels, capacity in the extremal regime pd,ps0p_d, p_s\to0 is given by

C(pd,ps)=1H(pd)H(ps)+o(pd+ps),C(p_d, p_s) = 1 - H(p_d) - H(p_s) + o(p_d + p_s),

where H()H(\cdot) denotes the binary entropy. The capacity loss is additive in the deletion and substitution entropies, and no significant second-order term appears in this regime (Kazemi et al., 4 Mar 2025). This guides code design: budget Hamming-type redundancy for both deletions and substitutions separately.

One-Bit Deletion/Duplication Channel

For the blockwise one-bit deletion/duplication channel, the asymptotic regime where blocklength \ell \to \infty and p+q0p+q \to 0 with (p+q)log0(p+q)\log\ell \to 0, the per-symbol capacity with side information is (Mirghasemi et al., 2012):

CSI(p,q,)=1p+qlog+p(K1)+q(K+1)+O((p+q)2log2),C_{SI}(p,q,\ell) = 1 - \frac{p+q}{\ell}\log\ell + \frac{p}{\ell}(K-1) + \frac{q}{\ell}(K+1) + O\left(\frac{(p+q)^2\log^2\ell}{\ell}\right),

with K1.2885K \approx 1.2885. The rate loss is governed by the p+qlog-\frac{p+q}{\ell}\log\ell term in leading order.

Zero-Rate Threshold in Adversarial Regimes

For worst-case adversarial deletions, there exists an absolute constant δ0>0\delta_0>0 such that no code of positive rate can correct a deletion fraction p1/2δ0p \ge 1/2-\delta_0 (Guruswami et al., 2021). The classical majority-matching only allowed the trivial $1/2$ bound; this establishes that the extremal deletion regime is strictly “forbidden” in the sense that high-rate codes cannot correct this level of deletions.

Channel/Model Threshold Rate Loss / Max. Correctable Reference
Binary deletion/substitution (i.i.d.) H(pd)+H(ps)H(p_d)+H(p_s) (Kazemi et al., 4 Mar 2025)
One-bit deletion/duplication (block) p+qlog\frac{p+q}{\ell}\log\ell (Mirghasemi et al., 2012)
Adversarial bit deletion pthr<1/2p_{thr} < 1/2 (Guruswami et al., 2021)

3. Extremal Deletion Codes: Bounds and Constructions

Multiset and Unordered Models

In the multiset channel, the extremal regime t=nkt = n-k is characterized by small output multiset size kk. For k=1k=1, Sq(n,n1)=qS_q(n,n-1)=q; only “constant-word” codes of the form {an:aΣ}\{a^n: a\in\Sigma\} are optimal. Similar characterizations hold for k=2,3k=2,3, with explicit Reiman-type incidence bounds (Kreindel et al., 9 Jan 2026).

For q=2q=2, the optimal code size for tt deletions is

S2(n,t)=n+1t+1,S_2(n,t) = \left\lfloor \frac{n+1}{t+1} \right\rfloor,

achieved by congruence classes of the codeword Hamming weight modulo t+1t+1. For q3q\geq3, cyclic Sidon-type congruence constructions yield explicit codes with redundancy logq(t(t+1)q2+1)\log_q(t(t+1)^{q-2}+1).

Sequence Structure: Extremal Run Distribution

Extremal combinatorics of deletion subsequences show that for fixed run count rr, balanced strings Br,kB_{r,k} maximize, and unbalanced strings Un,rU_{n,r} minimize, the number of distinct subsequences under tt deletions (Liron et al., 2012). Explicit bounds and formulas relate the structure to the size of the ambiguity class:

This structural minimax result directly informs design choices for codes targeting the extremal regime.

4. Explicit Code Constructions in Extremal Deletion Regimes

High-Noise, High-Rate Constructions

Polynomial-time codes achieving close-to-optimal performance under nearly total deletion rates (i.e., correcting 1ε1-\varepsilon deletions) follow a concatenated design paradigm:

  • Outer: Reed–Solomon code for erasure/error protection.
  • Inner: Dense, deletion-resilient code, with unique headers or buffered block boundaries.
  • For binary high-rate codes (δ0\delta \to 0), the rate satisfies R(ϵ)=1O~(ϵ)R(\epsilon) = 1-\widetilde{O}(\sqrt{\epsilon}) (Guruswami et al., 2014).
  • For high-noise codes, R(ϵ)=Ω(ϵ2)R(\epsilon) = \Omega(\epsilon^2) over alphabet size poly(1/ϵ)\mathrm{poly}(1/\epsilon), corrects deletion fraction 1ϵ1-\epsilon (Guruswami et al., 2014, Guruswami et al., 2016).

List-decoding and sliding-window techniques enable recovery without reliance on unique buffer markers in high-noise multialphabet settings (Guruswami et al., 2016). These constructions close the gap between existential and explicit combinatorial feasibility in the extremal regime.

2-Dimensional Reed–Solomon Codes for Insertion/Deletion

Explicit constructions for 2D RS codes with deletion-correcting capability matching the extremal regime include:

  • Codes achieving dinsdel2(1ε)n+O(1/ε2)d_{\mathrm{insdel}} \geq 2(1-\varepsilon)n + O(1/\varepsilon^2) for nn superpolynomial in logq\log q (Duc et al., 2019).
  • Codes realizing the exact extremal bound dinsdel=2n4d_{\mathrm{insdel}}=2n-4 for n=O(logq)n = O(\sqrt{\log q}) using Singer difference set techniques (Duc et al., 2019).

These break through prior logarithmic upper bounds to reach the Singleton bound and related extremal values.

5. Probabilistic and Combinatorial Extremes: Tree Models and Subsequence Profiles

In random recursive tree processes with deletion-dominated regimes (pp insert <12<\tfrac12), the expected size, variance, leaf count, and degree distributions collapse to constants independent of deletion rule specifics (Saunders, 2019):

  • E[Tn]qqp\mathbb{E}[|T_n|] \to \frac{q}{q-p}
  • Var(Tn)pq(qp)2\mathrm{Var}(|T_n|) \to \frac{pq}{(q-p)^2}

This suggests a form of “statistical equilibrium” in the extremal deletion regime, with macroscopic descriptive invariance under any rule from an equiprobable class.

For combinatorial subsequence profiles of binary strings, extremal structures (in run distribution) yield exponential separation in the number of output subsequences under deletions, affecting zero-error capacity and the adversarial analysis of list-decodability (Liron et al., 2012). This fine-grained understanding underpins both code constructions and impossibility results at the extremal limit.

6. Broader Implications and Open Directions

The extremal deletion regime thus imposes fundamental bounds on the capacity, structure, and explicit realizability of codes for channels experiencing severe synchronization errors. It motivates:

  • Explicit trade-off analysis between achievable rates, alphabet size, and correctable deletion fraction.
  • Combinatorial profile minimization to mitigate worst-case ambiguity growth.
  • Layered, modular design of practical codes separating synchronization from substitution correction.
  • The search for explicit constructions and structural bounds closing gaps in non-binary or high-dimension analogs.

Ongoing research targets sharpening the precise threshold phenomena (e.g., the exact zero-rate threshold for deletions in various settings), expanding the catalog of extremal constructions (notably for qq-ary and higher-dimensional codes), and deepening connections between code structure, combinatorial design, and information-theoretic optimality in these extremal regimes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Extremal Deletion Regime.