- The paper introduces a novel O*(4^d) randomized algorithm for the Binary Closest String problem that matches SETH-based lower bounds.
- It employs a stochastic local search utilizing a single long random walk to effectively reduce the maximum Hamming distance.
- The method emphasizes algorithmic simplicity and optimality, establishing a definitive baseline for future generalizations and improvements.
An Optimal Fixed-Parameter Algorithm for Binary Closest String
Problem Overview and Background
The Closest String problem, parameterized by a set of strings X⊆Σn, seeks a string y∗∈Σn minimizing the maximum Hamming distance to any string in X. The binary case, with ∣Σ∣=2, has substantial relevance both theoretically—being a canonical clustering problem in discrete mathematics—and practically, as it directly relates to core computational biology applications including PCR primer design, motif finding, and drug engineering. The parameterization by the optimal distance d (the maximum Hamming distance from y∗ to any string in X) is critical in algorithmic research due to the combinatorially hard nature of the underlying NP-hard decision problem, and practical relevance when d is small.
There is a longstanding sequence of improvements in the parameterized complexity for Binary Closest String, with running times having been iteratively decreased from the naïve O∗(dd) to O∗(16d), y∗∈Σn0, y∗∈Σn1, y∗∈Σn2, and finally y∗∈Σn3 [Chen, Ma, Wang, Algorithmica 2016]. However, recent work has established a fine-grained complexity lower bound of y∗∈Σn4 under SETH, raising the open question of whether an algorithm with time y∗∈Σn5 is achievable.
Algorithmic Contributions
This work presents the first y∗∈Σn6 randomized algorithm for Binary Closest String, matching the conditional lower bound and thus resolving the parameterized complexity of the problem. The algorithm's technical approach is a stochastic local search—a Markov process on the Hamming ball of radius y∗∈Σn7—which, although structurally similar to prior y∗∈Σn8 and y∗∈Σn9 local search schemes, is differentiated by a nuanced modification: instead of multiple short random walks, a single random walk of length X0 is executed.
A succinct outline of the algorithm is as follows:
- Begin with any X1;
- In each iteration, select X2 that maximizes X3;
- If X4, output X5 as a solution;
- Else, select a differing bit and make X6 agree with X7 on that coordinate;
- Repeat.
The Markov process is structured such that, if an optimal X8 at distance X9 exists, the expected number of steps to reach ∣Σ∣=20 is ∣Σ∣=21. The key technical lemma bounds the probability of progress (a reduction in Hamming distance to ∣Σ∣=22) from state ∣Σ∣=23 by ∣Σ∣=24, enabling the analysis of the algorithm's expected runtime via a hitting time evaluation. The termination guarantee (with high probability) follows by Markov inequality and exponential tail bounds, establishing reliable randomized correctness.
The algorithm is notably simple, implementable in a few lines, and achieves tight complexity bounds dictated by current fine-grained lower bounds [Abboud et al., ESA 2023].
Theoretical and Empirical Implications
The most significant theoretical contribution is the closure of the exponential gap between upper and lower bounds for parameterized algorithms for the Binary Closest String problem; the proposed algorithm is the first to achieve optimality (conditioned on SETH).
- Tight Lower Bound: Under SETH, it is impossible to improve upon the ∣Σ∣=25 runtime for the binary case, as demonstrated by recent reductions that apply for all ∣Σ∣=26, thus making further algorithmic improvements unfeasible in this regime barring major complexity breakthroughs.
- Algorithmic Simplicity: The approach possesses conceptual transparency and ease of implementation compared to prior, more intricate techniques.
- Micro-Optimization: The auxiliary observation that the polynomial factor in ∣Σ∣=27 can be replaced with ∣Σ∣=28 is pertinent for practical optimization.
While the paper's authors are circumspect regarding immediate practical impacts (since ∣Σ∣=29 is often small in practice and the base of the exponent dominates), the result establishes a definitive baseline for future work on (a) generalizations to larger alphabet sizes, where the parameterized complexity remains open, and (b) the strictly more general Closest Substring problem, which is of even higher practical value in biological sequence analysis.
Contradictory and Strong Claims
- The paper explicitly claims conditional optimality: No d0-time algorithm can exist for any d1 unless SETH fails.
- Simplicity vs. Optimality: The method attains both conceptual simplicity and optimal performance, a contrast to the trend of increasing algorithmic sophistication in previous improvements.
- The algorithm is robustly correct, either returning a correct solution or non-termination in the absence of a feasible center within distance d2.
Future Directions
Potential directions for continued research include:
- Extending the optimality to Closest String over larger alphabets (d3), where the gap between algorithmic upper and lower bounds is greater and where new techniques could be required for tight results.
- Generalizing to Closest Substring and related proximity problems in string metric spaces, for which current algorithms do not match known lower bounds, and fine-grained reductions are less developed.
- Investigating derandomization approaches for this optimal algorithm, or further reducing polynomial multiplicative factors in the runtime to enhance practical adoption, especially for moderate parameter values.
Conclusion
The paper "An Optimal Algorithm for Binary Closest String" (2605.31417) establishes the d4 randomized fixed-parameter tractable algorithm for the Binary Closest String problem, fully resolving the parameterized complexity of the canonical case. This result not only matches SETH-based lower bounds but does so with a conceptually minimal and implementable scheme. The work finalizes the journey of exponents in the binary setting, provides a sharpened baseline for both theoretical investigations and practical parameterized methods, and motivates future explorations into broader discrete proximity problems in strings and sequences.