Papers
Topics
Authors
Recent
Search
2000 character limit reached

Binary Closest String Problem

Updated 4 July 2026
  • Binary Closest String Problem is a minimax optimization challenge in binary Hamming space that finds a center string minimizing the maximum Hamming distance to all input strings.
  • It distinguishes between continuous and discrete variants, with NP-hardness and conditional lower bounds under SETH shaping the design of exact, FPT, and approximation algorithms.
  • Recent research develops exponential, parameterized, quantum, and dynamic algorithms to address complexity barriers and improve practical performance.

The Binary Closest String Problem is the minimax center problem over Hamming space on the alphabet {0,1}\{0,1\}. Given a finite set of equal-length binary strings X{0,1}LX \subseteq \{0,1\}^L, the task is to find a binary string y{0,1}Ly \in \{0,1\}^L minimizing the maximum Hamming distance to the strings in XX, that is, to compute

d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).

The problem is also referred to as the string center problem or string consensus problem, although the latter label can be misleading because some literature uses “consensus” for sum-of-distances objectives rather than the minimax objective considered here (Abboud et al., 2023). In the binary setting, Hamming distance coincides with Manhattan distance, so Binary Closest String is a special case of Manhattan Sequence Consensus (Kociumaka et al., 2014). Contemporary work on the problem spans exact exponential algorithms, fine-grained lower bounds, fixed-parameter tractability, LP/IP/CSP formulations, dynamic data structures, and quantum algorithms (Fischer et al., 29 May 2026).

1. Formal definition and problem variants

For binary strings x,y{0,1}Lx,y\in\{0,1\}^L, the Hamming distance is

dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,

and a Hamming ball of radius RR around xx is {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\} (Abboud et al., 2023). The optimization problem asks for a center whose smallest enclosing Hamming ball contains all input strings.

Two standard variants are distinguished in fine-grained analyses. In the continuous version, the center may be any string in X{0,1}LX \subseteq \{0,1\}^L0. In the discrete version, the center is required to belong to the input set itself (Abboud et al., 2023). The discrete variant is polynomial-time solvable by exhaustive pairwise distance computation, but its exact complexity depends sharply on the relationship between the number of strings and the dimension.

A natural dual is the Remotest String problem. In the continuous binary case, it maximizes X{0,1}LX \subseteq \{0,1\}^L1 over all X{0,1}LX \subseteq \{0,1\}^L2; in the discrete case it maximizes the minimum distance to the other input strings (Abboud et al., 2023). For binary alphabets, continuous Closest String and continuous Remotest String are equivalent via complementation:

X{0,1}LX \subseteq \{0,1\}^L3

hence

X{0,1}LX \subseteq \{0,1\}^L4

This binary-specific equivalence is central in several lower-bound transfers (Abboud et al., 2023).

The problem also has a standard parameterized decision form: given X{0,1}LX \subseteq \{0,1\}^L5 and an integer X{0,1}LX \subseteq \{0,1\}^L6, decide whether there exists a center X{0,1}LX \subseteq \{0,1\}^L7 with X{0,1}LX \subseteq \{0,1\}^L8 (Fischer et al., 29 May 2026). This decision version underlies most FPT algorithms and most exact lower bounds parameterized by the optimum radius.

2. Complexity landscape and conditional barriers

The exact minimax Closest String Problem is NP-hard, and the binary minimax version is NP-complete (0705.0561). The classical exhaustive baselines are immediate. For the continuous binary problem, enumerating all X{0,1}LX \subseteq \{0,1\}^L9 candidate centers and evaluating their radii gives time y{0,1}Ly \in \{0,1\}^L0 for y{0,1}Ly \in \{0,1\}^L1 (Abboud et al., 2023). For the discrete problem, computing all pairwise Hamming distances and taking row maxima gives time y{0,1}Ly \in \{0,1\}^L2 and space y{0,1}Ly \in \{0,1\}^L3 if all distances are stored (Abboud et al., 2023).

Recent fine-grained results identify a sharp dichotomy between the continuous and discrete versions. For the continuous binary problem, exhaustive search is conditionally optimal: there is no exact algorithm with running time y{0,1}Ly \in \{0,1\}^L4 for any y{0,1}Ly \in \{0,1\}^L5 unless SETH fails (Abboud et al., 2023). In the discrete problem, a different barrier appears. When the dimension lies in the regime y{0,1}Ly \in \{0,1\}^L6, the exact complexity is conditionally quadratic, y{0,1}Ly \in \{0,1\}^L7, under the Hitting Set Conjecture (Abboud et al., 2023). The same hard range transfers to binary discrete Remotest String through a reduction that preserves y{0,1}Ly \in \{0,1\}^L8 up to constant factors and increases dimension by only y{0,1}Ly \in \{0,1\}^L9 (Abboud et al., 2023).

Parameterized by the optimum radius XX0, a second barrier is known. A 2026 result gives a randomized exact algorithm with running time XX1 and proves that no XX2-time algorithm exists for any constant XX3 unless SETH fails (Fischer et al., 29 May 2026). The lower bound is obtained by inspecting fine-grained hard instances where the optimal radius satisfies XX4, together with a padding argument extending the conclusion to all XX5 (Fischer et al., 29 May 2026).

Approximation schemes are also tightly constrained. Although PTASes for Closest String are known, there is no EPTAS unless XX6 (Cygan et al., 2015). More quantitatively, for any computable XX7, a PTAS with runtime XX8 would contradict ETH, and this lower bound already holds over the binary alphabet (Cygan et al., 2015). A plausible implication is that the binary problem is unusual in combining strong positive approximation results with unusually rigid barriers on the XX9-dependence.

3. Exact algorithms across parameter regimes

Three exact regimes dominate the modern algorithmic picture: exponential search in the string length, subquadratic exact algorithms for restricted discrete regimes, and FPT algorithms parameterized by the optimum radius.

For the continuous binary problem, the exact situation is stark. The trivial d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).0 enumeration is essentially best possible under SETH (Abboud et al., 2023). This rules out meet-in-the-middle-like exact speedups in the exponent and makes the continuous binary problem a canonical example of exhaustive-search optimality in fine-grained complexity.

For the discrete problem, exact improvements are possible outside the HSC-hard range. In the small-dimension regime d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).1, there is an exact d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).2 algorithm based on a novel use of inclusion–exclusion (Abboud et al., 2023). Its key identity rewrites the indicator of bounded Hamming distance in terms of coordinate-subset agreement, allowing preprocessing of counts

d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).3

for all subsets d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).4, followed by radius tests through alternating sums over subset sizes (Abboud et al., 2023). The algorithm runs in d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).5 time and uses d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).6 words of memory; it becomes subquadratic whenever d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).7 (Abboud et al., 2023).

In the large-dimension regime d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).8 for any fixed d=miny{0,1}LmaxxXdH(y,x).d^*=\min_{y\in\{0,1\}^L}\max_{x\in X} d_H(y,x).9, exact discrete Closest String can be solved in x,y{0,1}Lx,y\in\{0,1\}^L0 time for some x,y{0,1}Lx,y\in\{0,1\}^L1 by computing all pairwise Hamming distances faster than x,y{0,1}Lx,y\in\{0,1\}^L2 via heavy–light splitting and fast matrix multiplication (Abboud et al., 2023). The construction forms a sparse binary indicator matrix x,y{0,1}Lx,y\in\{0,1\}^L3 with

x,y{0,1}Lx,y\in\{0,1\}^L4

then handles heavy columns by fast MM and light columns by sparse accumulation (Abboud et al., 2023). This yields a polynomial improvement over the brute-force x,y{0,1}Lx,y\in\{0,1\}^L5 baseline.

The parameterized landscape has undergone a long sequence of base improvements. For Binary Closest String parameterized by the optimum radius x,y{0,1}Lx,y\in\{0,1\}^L6, the progression reported in 2026 is

x,y{0,1}Lx,y\in\{0,1\}^L7

with the x,y{0,1}Lx,y\in\{0,1\}^L8 algorithm being conditionally optimal under SETH (Fischer et al., 29 May 2026). Its procedure is remarkably simple: start from an arbitrary input string, repeatedly select a farthest string x,y{0,1}Lx,y\in\{0,1\}^L9, and if dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,0, flip one uniformly random disagreement bit of the current center dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,1 toward dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,2 (Fischer et al., 29 May 2026). The analysis tracks the state dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,3 relative to a fixed optimum dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,4 and proves a progress probability

dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,5

leading to an expected bound of at most dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,6 iterations and overall running time dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,7 (Fischer et al., 29 May 2026). With a reset rule triggered when the farthest distance exceeds dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,8, the polynomial factor can be improved to dH(x,y)={i[L]:x[i]y[i]},d_H(x,y)=|\{i\in[L]:x[i]\neq y[i]\}|,9 (Fischer et al., 29 May 2026).

4. Structural special cases and small-number-of-strings regimes

A distinct exact line of work studies instances with a small number of input strings. In the Manhattan Sequence Consensus framework, which subsumes Binary Closest String, there is an RR0-time exact algorithm for RR1 sequences of length RR2 (Kociumaka et al., 2014). For binary strings, this immediately yields an RR3 algorithm for the minimax closest string problem with at most five inputs.

The method is column-based. For each position, the RR4 input symbols are sorted, interval systems are defined over adjacent ranks, and the global objective is converted into a constrained ILP with a single radius variable (Kociumaka et al., 2014). Two algebraic reductions are then applied: sign normalization by negation of variables, and merging of variables with identical coefficient vectors by Minkowski summation of their ranges. For general RR5, this reduces the number of variables to at most RR6; for RR7, the paper proves a much stronger combinatorial classification (Kociumaka et al., 2014).

For RR8, every optimal sum-MSC sequence belongs to one of 20 families: five border families RR9, five middle families xx0, and ten triangle families xx1 (Kociumaka et al., 2014). Each family induces an interval system whose ILP reduces to an “easy ILP” with at most four variables, solvable in xx2 amortized time after linear-time preprocessing over columns (Kociumaka et al., 2014). The overall algorithm constructs the 20 candidates, solves their reduced ILPs, and selects the minimum radius.

The same paper gives a kernelization for general parameter xx3. By merging columns with identical permutation types, any instance reduces in linear time to length at most xx4, and in the binary case, with appropriate tie-breaking, to length at most xx5 (Kociumaka et al., 2014). This shows fixed-parameter tractability in the number of strings, although the resulting generic exact algorithm is considered impractical beyond very small xx6 because naive enumeration over interval systems remains too large (Kociumaka et al., 2014).

This small-xx7 regime is structurally different from the fine-grained large-xx8 regime. A plausible implication is that Binary Closest String supports two orthogonal exact methodologies: parameterization by the optimum radius, and parameterization by the number of strings through combinatorial column types.

5. Mathematical formulations and solver frameworks

Several exact and near-exact frameworks formulate Binary Closest String as IP, LP, CSP, or QUBO. These formulations emphasize different aspects of the problem: certification, propagation, practical heuristics, or hardware embedding.

An IP formulation introduces binary center variables and mismatch variables. In one-hot form, xx9 indicates that position {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}0 of the center uses symbol {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}1, and the objective minimizes a radius variable {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}2 subject to one-hot constraints and distance upper bounds (0705.0561). In a binary specialization, one may instead use center bits {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}3 and mismatch variables {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}4 satisfying

{y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}5

for every input string {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}6 (0705.0561). LP relaxation followed by greedy iterative rounding yields a polynomial-time heuristic that is exact for two strings and has additive error at most one for three binary strings (0705.0561).

A CSP formulation is more explicitly combinatorial. It uses center variables {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}7, reified mismatch indicators {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}8, per-string distances {y{0,1}L:dH(x,y)R}\{y\in\{0,1\}^L:d_H(x,y)\le R\}9, and a radius variable X{0,1}LX \subseteq \{0,1\}^L00 (Kelsey et al., 2010). Two standard bounds are built in: the Hamming diameter upper bound

X{0,1}LX \subseteq \{0,1\}^L01

and the lower bound

X{0,1}LX \subseteq \{0,1\}^L02

from the triangle inequality (Kelsey et al., 2010). The paper’s principal heuristic is PWM ordering: positions are searched by decreasing majority count, and the majority bit is tried first. Reported experiments show “several orders of magnitude” speedups over generic orderings at and above the optimal distance, while the dominant residual cost is often certifying infeasibility at X{0,1}LX \subseteq \{0,1\}^L03 (Kelsey et al., 2010). The same framework also supports enumeration of all optimal centers and distributed two-front search that combines optimization from above with infeasibility proofs from below (Kelsey et al., 2010).

A recurrent source of confusion is the distinction between the minimax objective and the sum-of-distances objective. The CSP work explicitly notes that minimizing the sum of Hamming distances is a different objective, outside its scope (Kelsey et al., 2010). The D-Wave annealing paper makes this distinction operationally important: its QUBO formulations optimize a sum-of-distances proxy, not the max-distance radius directly (Dissanayake, 2023). In that formulation, each column uses one-hot variables X{0,1}LX \subseteq \{0,1\}^L04 selecting one observed input symbol for the center, with a clique penalty enforcing exactly one choice and a linear objective coefficient equal to the precomputed number of mismatches induced by that choice (Dissanayake, 2023). Because columns do not couple, the QUBO decomposes exactly across columns under that objective, enabling substring batching on Pegasus hardware; the evaluation metric is the Occurrence Ratio

X{0,1}LX \subseteq \{0,1\}^L05

and the paper reports recovery of expected solutions on small test cases with minimal hyperparameter tuning (Dissanayake, 2023).

6. Dynamic, approximation, and quantum extensions

Binary Closest String has also been studied in dynamic and quantum models, and these extensions expose which parts of the classical structure are robust under changing computational assumptions.

In the dynamic setting, one maintains a set of X{0,1}LX \subseteq \{0,1\}^L06 binary strings of fixed length X{0,1}LX \subseteq \{0,1\}^L07 under point updates X{0,1}LX \subseteq \{0,1\}^L08 and answers feasibility queries for a fixed radius parameter X{0,1}LX \subseteq \{0,1\}^L09 (Olkowski et al., 2022). The binary-specialized data structures have initialization time X{0,1}LX \subseteq \{0,1\}^L10, amortized update time X{0,1}LX \subseteq \{0,1\}^L11, and worst-case query time X{0,1}LX \subseteq \{0,1\}^L12 (Olkowski et al., 2022). They maintain an approximate origin string X{0,1}LX \subseteq \{0,1\}^L13 such that, if the instance is feasible, then X{0,1}LX \subseteq \{0,1\}^L14 for every string X{0,1}LX \subseteq \{0,1\}^L15; if some string ever exceeds distance X{0,1}LX \subseteq \{0,1\}^L16 from X{0,1}LX \subseteq \{0,1\}^L17, infeasibility is certified (Olkowski et al., 2022). Random color-coding of mismatch positions then supports a fast “far-pair” negativity check: if two strings are at Hamming distance greater than X{0,1}LX \subseteq \{0,1\}^L18, the structure detects this with probability at least X{0,1}LX \subseteq \{0,1\}^L19 (Olkowski et al., 2022). The resulting Monte Carlo guarantees have no false negatives and false positives of probability at most X{0,1}LX \subseteq \{0,1\}^L20 after standard amplification (Olkowski et al., 2022).

Approximation remains theoretically delicate. The binary problem admits PTASes, but the lower bounds on their runtime dependence imply that high-accuracy approximation cannot be made strongly efficient in the EPTAS sense unless parameterized complexity collapses (Cygan et al., 2015). This suggests that, in binary Hamming space, approximation is useful primarily when the classical PTAS exponents are acceptable or when exact FPT algorithms in the optimum radius are inapplicable.

Quantum algorithms split into two main regimes. A nested Dürr–Høyer minimum/maximum search solves binary Closest String in time X{0,1}LX \subseteq \{0,1\}^L21, and under a quantum analogue of SETH, the dominant X{0,1}LX \subseteq \{0,1\}^L22 scaling cannot be improved to X{0,1}LX \subseteq \{0,1\}^L23 for any X{0,1}LX \subseteq \{0,1\}^L24 (Cudby et al., 17 Oct 2025). A second algorithm quantizes the bounded-search tree of Gramm et al. by an MNRS quantum walk and gives a binary Hamming runtime of X{0,1}LX \subseteq \{0,1\}^L25 in the paper’s notation, improving the dominant dependence on the radius parameter from X{0,1}LX \subseteq \{0,1\}^L26 to X{0,1}LX \subseteq \{0,1\}^L27 relative to the classical branching backbone (Cudby et al., 17 Oct 2025). The same work also gives a quantum dynamic-programming algorithm for Levenshtein-distance variants and a quantum-accelerated generator search for binary Closest Substring (Cudby et al., 17 Oct 2025). However, the paper explicitly notes that for binary alphabets the best classical fixed-alphabet exact bounds X{0,1}LX \subseteq \{0,1\}^L28 may outperform the walk-based quantum algorithm once X{0,1}LX \subseteq \{0,1\}^L29 (Cudby et al., 17 Oct 2025).

Across these directions, the binary case is repeatedly special. Complementation equates Closest and Remotest String in the continuous setting; small alphabets change the parameter tradeoffs in both classical and quantum algorithms; and several lower bounds become strongest precisely over X{0,1}LX \subseteq \{0,1\}^L30 (Abboud et al., 2023). The current exact picture is therefore unusually crisp: exhaustive search is conditionally optimal for the continuous binary problem, the discrete problem admits faster exact algorithms only outside the super-logarithmic hard range, and the radius-parameterized complexity is settled at randomized X{0,1}LX \subseteq \{0,1\}^L31 time (Fischer et al., 29 May 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Binary Closest String Problem.