Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Optimal Algorithm for Binary Closest String

Published 29 May 2026 in cs.DS | (2605.31417v1)

Abstract: We revisit the Binary Closest String problem, which asks, given a set of binary strings $X \subseteq {0, 1}n$, to compute a string minimizing the maximum Hamming distance to $X$. A long line of work has focused on parameterized algorithms with respect to the optimal distance $d$, yielding a sequence of improvements from $O*(dd)$ through $O*(16d)$, $O*(9.513d)$, $O*(8d)$, $O*(6.731d)$ to the current best-known running time of $O*(5d)$ [Chen, Ma, Wang; Algorithmica '16]. We present a faster randomized algorithm running in time $O*(4d)$. Our result matches a recent fine-grained lower bound [Abboud, Fischer, Goldenberg, Karthik C.S., Safier; ESA '23], and is therefore conditionally optimal. As an extra benefit, our algorithm is remarkably simple.

Authors (2)

Summary

  • The paper introduces a novel O*(4^d) randomized algorithm for the Binary Closest String problem that matches SETH-based lower bounds.
  • It employs a stochastic local search utilizing a single long random walk to effectively reduce the maximum Hamming distance.
  • The method emphasizes algorithmic simplicity and optimality, establishing a definitive baseline for future generalizations and improvements.

An Optimal Fixed-Parameter Algorithm for Binary Closest String

Problem Overview and Background

The Closest String problem, parameterized by a set of strings X⊆ΣnX \subseteq \Sigma^n, seeks a string y∗∈Σny^* \in \Sigma^n minimizing the maximum Hamming distance to any string in XX. The binary case, with ∣Σ∣=2|\Sigma| = 2, has substantial relevance both theoretically—being a canonical clustering problem in discrete mathematics—and practically, as it directly relates to core computational biology applications including PCR primer design, motif finding, and drug engineering. The parameterization by the optimal distance dd (the maximum Hamming distance from y∗y^* to any string in XX) is critical in algorithmic research due to the combinatorially hard nature of the underlying NP-hard decision problem, and practical relevance when dd is small.

There is a longstanding sequence of improvements in the parameterized complexity for Binary Closest String, with running times having been iteratively decreased from the naïve O∗(dd)O^*(d^d) to O∗(16d)O^*(16^d), y∗∈Σny^* \in \Sigma^n0, y∗∈Σny^* \in \Sigma^n1, y∗∈Σny^* \in \Sigma^n2, and finally y∗∈Σny^* \in \Sigma^n3 [Chen, Ma, Wang, Algorithmica 2016]. However, recent work has established a fine-grained complexity lower bound of y∗∈Σny^* \in \Sigma^n4 under SETH, raising the open question of whether an algorithm with time y∗∈Σny^* \in \Sigma^n5 is achievable.

Algorithmic Contributions

This work presents the first y∗∈Σny^* \in \Sigma^n6 randomized algorithm for Binary Closest String, matching the conditional lower bound and thus resolving the parameterized complexity of the problem. The algorithm's technical approach is a stochastic local search—a Markov process on the Hamming ball of radius y∗∈Σny^* \in \Sigma^n7—which, although structurally similar to prior y∗∈Σny^* \in \Sigma^n8 and y∗∈Σny^* \in \Sigma^n9 local search schemes, is differentiated by a nuanced modification: instead of multiple short random walks, a single random walk of length XX0 is executed.

A succinct outline of the algorithm is as follows:

  • Begin with any XX1;
  • In each iteration, select XX2 that maximizes XX3;
  • If XX4, output XX5 as a solution;
  • Else, select a differing bit and make XX6 agree with XX7 on that coordinate;
  • Repeat.

The Markov process is structured such that, if an optimal XX8 at distance XX9 exists, the expected number of steps to reach ∣Σ∣=2|\Sigma| = 20 is ∣Σ∣=2|\Sigma| = 21. The key technical lemma bounds the probability of progress (a reduction in Hamming distance to ∣Σ∣=2|\Sigma| = 22) from state ∣Σ∣=2|\Sigma| = 23 by ∣Σ∣=2|\Sigma| = 24, enabling the analysis of the algorithm's expected runtime via a hitting time evaluation. The termination guarantee (with high probability) follows by Markov inequality and exponential tail bounds, establishing reliable randomized correctness.

The algorithm is notably simple, implementable in a few lines, and achieves tight complexity bounds dictated by current fine-grained lower bounds [Abboud et al., ESA 2023].

Theoretical and Empirical Implications

The most significant theoretical contribution is the closure of the exponential gap between upper and lower bounds for parameterized algorithms for the Binary Closest String problem; the proposed algorithm is the first to achieve optimality (conditioned on SETH).

  • Tight Lower Bound: Under SETH, it is impossible to improve upon the ∣Σ∣=2|\Sigma| = 25 runtime for the binary case, as demonstrated by recent reductions that apply for all ∣Σ∣=2|\Sigma| = 26, thus making further algorithmic improvements unfeasible in this regime barring major complexity breakthroughs.
  • Algorithmic Simplicity: The approach possesses conceptual transparency and ease of implementation compared to prior, more intricate techniques.
  • Micro-Optimization: The auxiliary observation that the polynomial factor in ∣Σ∣=2|\Sigma| = 27 can be replaced with ∣Σ∣=2|\Sigma| = 28 is pertinent for practical optimization.

While the paper's authors are circumspect regarding immediate practical impacts (since ∣Σ∣=2|\Sigma| = 29 is often small in practice and the base of the exponent dominates), the result establishes a definitive baseline for future work on (a) generalizations to larger alphabet sizes, where the parameterized complexity remains open, and (b) the strictly more general Closest Substring problem, which is of even higher practical value in biological sequence analysis.

Contradictory and Strong Claims

  • The paper explicitly claims conditional optimality: No dd0-time algorithm can exist for any dd1 unless SETH fails.
  • Simplicity vs. Optimality: The method attains both conceptual simplicity and optimal performance, a contrast to the trend of increasing algorithmic sophistication in previous improvements.
  • The algorithm is robustly correct, either returning a correct solution or non-termination in the absence of a feasible center within distance dd2.

Future Directions

Potential directions for continued research include:

  • Extending the optimality to Closest String over larger alphabets (dd3), where the gap between algorithmic upper and lower bounds is greater and where new techniques could be required for tight results.
  • Generalizing to Closest Substring and related proximity problems in string metric spaces, for which current algorithms do not match known lower bounds, and fine-grained reductions are less developed.
  • Investigating derandomization approaches for this optimal algorithm, or further reducing polynomial multiplicative factors in the runtime to enhance practical adoption, especially for moderate parameter values.

Conclusion

The paper "An Optimal Algorithm for Binary Closest String" (2605.31417) establishes the dd4 randomized fixed-parameter tractable algorithm for the Binary Closest String problem, fully resolving the parameterized complexity of the canonical case. This result not only matches SETH-based lower bounds but does so with a conceptually minimal and implementable scheme. The work finalizes the journey of exponents in the binary setting, provides a sharpened baseline for both theoretical investigations and practical parameterized methods, and motivates future explorations into broader discrete proximity problems in strings and sequences.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 11 likes about this paper.