An Optimal Algorithm for Binary Closest String

Published 29 May 2026 in cs.DS | (2605.31417v1)

Abstract: We revisit the Binary Closest String problem, which asks, given a set of binary strings $X \subseteq {0, 1}^n$, to compute a string minimizing the maximum Hamming distance to $X$. A long line of work has focused on parameterized algorithms with respect to the optimal distance $d$, yielding a sequence of improvements from $O^{*(d^d)$} through $O^{*(16^d)$,} $O^{*(9.513^d)$,} $O^{*(8^d)$,} $O^{*(6.731^d)$} to the current best-known running time of $O^{*(5^d)$} [Chen, Ma, Wang; Algorithmica '16]. We present a faster randomized algorithm running in time $O^{*(4^d)$.} Our result matches a recent fine-grained lower bound [Abboud, Fischer, Goldenberg, Karthik C.S., Safier; ESA '23], and is therefore conditionally optimal. As an extra benefit, our algorithm is remarkably simple.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a novel O*(4^d) randomized algorithm for the Binary Closest String problem that matches SETH-based lower bounds.
It employs a stochastic local search utilizing a single long random walk to effectively reduce the maximum Hamming distance.
The method emphasizes algorithmic simplicity and optimality, establishing a definitive baseline for future generalizations and improvements.

An Optimal Fixed-Parameter Algorithm for Binary Closest String

Problem Overview and Background

The Closest String problem, parameterized by a set of strings $X \subseteq \Sigma^n$ , seeks a string $y^* \in \Sigma^n$ minimizing the maximum Hamming distance to any string in $X$ . The binary case, with $|\Sigma| = 2$ , has substantial relevance both theoretically—being a canonical clustering problem in discrete mathematics—and practically, as it directly relates to core computational biology applications including PCR primer design, motif finding, and drug engineering. The parameterization by the optimal distance $d$ (the maximum Hamming distance from $y^*$ to any string in $X$ ) is critical in algorithmic research due to the combinatorially hard nature of the underlying NP-hard decision problem, and practical relevance when $d$ is small.

There is a longstanding sequence of improvements in the parameterized complexity for Binary Closest String, with running times having been iteratively decreased from the naïve $O^*(d^d)$ to $O^*(16^d)$ , $y^* \in \Sigma^n$ 0, $y^* \in \Sigma^n$ 1, $y^* \in \Sigma^n$ 2, and finally $y^* \in \Sigma^n$ 3 [Chen, Ma, Wang, Algorithmica 2016]. However, recent work has established a fine-grained complexity lower bound of $y^* \in \Sigma^n$ 4 under SETH, raising the open question of whether an algorithm with time $y^* \in \Sigma^n$ 5 is achievable.

Algorithmic Contributions

This work presents the first $y^* \in \Sigma^n$ 6 randomized algorithm for Binary Closest String, matching the conditional lower bound and thus resolving the parameterized complexity of the problem. The algorithm's technical approach is a stochastic local search—a Markov process on the Hamming ball of radius $y^* \in \Sigma^n$ 7—which, although structurally similar to prior $y^* \in \Sigma^n$ 8 and $y^* \in \Sigma^n$ 9 local search schemes, is differentiated by a nuanced modification: instead of multiple short random walks, a single random walk of length $X$ 0 is executed.

A succinct outline of the algorithm is as follows:

Begin with any $X$ 1;
In each iteration, select $X$ 2 that maximizes $X$ 3;
If $X$ 4, output $X$ 5 as a solution;
Else, select a differing bit and make $X$ 6 agree with $X$ 7 on that coordinate;
Repeat.

The Markov process is structured such that, if an optimal $X$ 8 at distance $X$ 9 exists, the expected number of steps to reach $|\Sigma| = 2$ 0 is $|\Sigma| = 2$ 1. The key technical lemma bounds the probability of progress (a reduction in Hamming distance to $|\Sigma| = 2$ 2) from state $|\Sigma| = 2$ 3 by $|\Sigma| = 2$ 4, enabling the analysis of the algorithm's expected runtime via a hitting time evaluation. The termination guarantee (with high probability) follows by Markov inequality and exponential tail bounds, establishing reliable randomized correctness.

The algorithm is notably simple, implementable in a few lines, and achieves tight complexity bounds dictated by current fine-grained lower bounds [Abboud et al., ESA 2023].

Theoretical and Empirical Implications

The most significant theoretical contribution is the closure of the exponential gap between upper and lower bounds for parameterized algorithms for the Binary Closest String problem; the proposed algorithm is the first to achieve optimality (conditioned on SETH).

Tight Lower Bound: Under SETH, it is impossible to improve upon the $|\Sigma| = 2$ 5 runtime for the binary case, as demonstrated by recent reductions that apply for all $|\Sigma| = 2$ 6, thus making further algorithmic improvements unfeasible in this regime barring major complexity breakthroughs.
Algorithmic Simplicity: The approach possesses conceptual transparency and ease of implementation compared to prior, more intricate techniques.
Micro-Optimization: The auxiliary observation that the polynomial factor in $|\Sigma| = 2$ 7 can be replaced with $|\Sigma| = 2$ 8 is pertinent for practical optimization.

While the paper's authors are circumspect regarding immediate practical impacts (since $|\Sigma| = 2$ 9 is often small in practice and the base of the exponent dominates), the result establishes a definitive baseline for future work on (a) generalizations to larger alphabet sizes, where the parameterized complexity remains open, and (b) the strictly more general Closest Substring problem, which is of even higher practical value in biological sequence analysis.

Contradictory and Strong Claims

The paper explicitly claims conditional optimality: No $d$ 0-time algorithm can exist for any $d$ 1 unless SETH fails.
Simplicity vs. Optimality: The method attains both conceptual simplicity and optimal performance, a contrast to the trend of increasing algorithmic sophistication in previous improvements.
The algorithm is robustly correct, either returning a correct solution or non-termination in the absence of a feasible center within distance $d$ 2.

Future Directions

Potential directions for continued research include:

Extending the optimality to Closest String over larger alphabets ( $d$ 3), where the gap between algorithmic upper and lower bounds is greater and where new techniques could be required for tight results.
Generalizing to Closest Substring and related proximity problems in string metric spaces, for which current algorithms do not match known lower bounds, and fine-grained reductions are less developed.
Investigating derandomization approaches for this optimal algorithm, or further reducing polynomial multiplicative factors in the runtime to enhance practical adoption, especially for moderate parameter values.

Conclusion

The paper "An Optimal Algorithm for Binary Closest String" (2605.31417) establishes the $d$ 4 randomized fixed-parameter tractable algorithm for the Binary Closest String problem, fully resolving the parameterized complexity of the canonical case. This result not only matches SETH-based lower bounds but does so with a conceptually minimal and implementable scheme. The work finalizes the journey of exponents in the binary setting, provides a sharpened baseline for both theoretical investigations and practical parameterized methods, and motivates future explorations into broader discrete proximity problems in strings and sequences.