Binary Closest String Problem
- Binary Closest String Problem is a minimax optimization challenge in binary Hamming space that finds a center string minimizing the maximum Hamming distance to all input strings.
- It distinguishes between continuous and discrete variants, with NP-hardness and conditional lower bounds under SETH shaping the design of exact, FPT, and approximation algorithms.
- Recent research develops exponential, parameterized, quantum, and dynamic algorithms to address complexity barriers and improve practical performance.
The Binary Closest String Problem is the minimax center problem over Hamming space on the alphabet . Given a finite set of equal-length binary strings , the task is to find a binary string minimizing the maximum Hamming distance to the strings in , that is, to compute
The problem is also referred to as the string center problem or string consensus problem, although the latter label can be misleading because some literature uses “consensus” for sum-of-distances objectives rather than the minimax objective considered here (Abboud et al., 2023). In the binary setting, Hamming distance coincides with Manhattan distance, so Binary Closest String is a special case of Manhattan Sequence Consensus (Kociumaka et al., 2014). Contemporary work on the problem spans exact exponential algorithms, fine-grained lower bounds, fixed-parameter tractability, LP/IP/CSP formulations, dynamic data structures, and quantum algorithms (Fischer et al., 29 May 2026).
1. Formal definition and problem variants
For binary strings , the Hamming distance is
and a Hamming ball of radius around is (Abboud et al., 2023). The optimization problem asks for a center whose smallest enclosing Hamming ball contains all input strings.
Two standard variants are distinguished in fine-grained analyses. In the continuous version, the center may be any string in 0. In the discrete version, the center is required to belong to the input set itself (Abboud et al., 2023). The discrete variant is polynomial-time solvable by exhaustive pairwise distance computation, but its exact complexity depends sharply on the relationship between the number of strings and the dimension.
A natural dual is the Remotest String problem. In the continuous binary case, it maximizes 1 over all 2; in the discrete case it maximizes the minimum distance to the other input strings (Abboud et al., 2023). For binary alphabets, continuous Closest String and continuous Remotest String are equivalent via complementation:
3
hence
4
This binary-specific equivalence is central in several lower-bound transfers (Abboud et al., 2023).
The problem also has a standard parameterized decision form: given 5 and an integer 6, decide whether there exists a center 7 with 8 (Fischer et al., 29 May 2026). This decision version underlies most FPT algorithms and most exact lower bounds parameterized by the optimum radius.
2. Complexity landscape and conditional barriers
The exact minimax Closest String Problem is NP-hard, and the binary minimax version is NP-complete (0705.0561). The classical exhaustive baselines are immediate. For the continuous binary problem, enumerating all 9 candidate centers and evaluating their radii gives time 0 for 1 (Abboud et al., 2023). For the discrete problem, computing all pairwise Hamming distances and taking row maxima gives time 2 and space 3 if all distances are stored (Abboud et al., 2023).
Recent fine-grained results identify a sharp dichotomy between the continuous and discrete versions. For the continuous binary problem, exhaustive search is conditionally optimal: there is no exact algorithm with running time 4 for any 5 unless SETH fails (Abboud et al., 2023). In the discrete problem, a different barrier appears. When the dimension lies in the regime 6, the exact complexity is conditionally quadratic, 7, under the Hitting Set Conjecture (Abboud et al., 2023). The same hard range transfers to binary discrete Remotest String through a reduction that preserves 8 up to constant factors and increases dimension by only 9 (Abboud et al., 2023).
Parameterized by the optimum radius 0, a second barrier is known. A 2026 result gives a randomized exact algorithm with running time 1 and proves that no 2-time algorithm exists for any constant 3 unless SETH fails (Fischer et al., 29 May 2026). The lower bound is obtained by inspecting fine-grained hard instances where the optimal radius satisfies 4, together with a padding argument extending the conclusion to all 5 (Fischer et al., 29 May 2026).
Approximation schemes are also tightly constrained. Although PTASes for Closest String are known, there is no EPTAS unless 6 (Cygan et al., 2015). More quantitatively, for any computable 7, a PTAS with runtime 8 would contradict ETH, and this lower bound already holds over the binary alphabet (Cygan et al., 2015). A plausible implication is that the binary problem is unusual in combining strong positive approximation results with unusually rigid barriers on the 9-dependence.
3. Exact algorithms across parameter regimes
Three exact regimes dominate the modern algorithmic picture: exponential search in the string length, subquadratic exact algorithms for restricted discrete regimes, and FPT algorithms parameterized by the optimum radius.
For the continuous binary problem, the exact situation is stark. The trivial 0 enumeration is essentially best possible under SETH (Abboud et al., 2023). This rules out meet-in-the-middle-like exact speedups in the exponent and makes the continuous binary problem a canonical example of exhaustive-search optimality in fine-grained complexity.
For the discrete problem, exact improvements are possible outside the HSC-hard range. In the small-dimension regime 1, there is an exact 2 algorithm based on a novel use of inclusion–exclusion (Abboud et al., 2023). Its key identity rewrites the indicator of bounded Hamming distance in terms of coordinate-subset agreement, allowing preprocessing of counts
3
for all subsets 4, followed by radius tests through alternating sums over subset sizes (Abboud et al., 2023). The algorithm runs in 5 time and uses 6 words of memory; it becomes subquadratic whenever 7 (Abboud et al., 2023).
In the large-dimension regime 8 for any fixed 9, exact discrete Closest String can be solved in 0 time for some 1 by computing all pairwise Hamming distances faster than 2 via heavy–light splitting and fast matrix multiplication (Abboud et al., 2023). The construction forms a sparse binary indicator matrix 3 with
4
then handles heavy columns by fast MM and light columns by sparse accumulation (Abboud et al., 2023). This yields a polynomial improvement over the brute-force 5 baseline.
The parameterized landscape has undergone a long sequence of base improvements. For Binary Closest String parameterized by the optimum radius 6, the progression reported in 2026 is
7
with the 8 algorithm being conditionally optimal under SETH (Fischer et al., 29 May 2026). Its procedure is remarkably simple: start from an arbitrary input string, repeatedly select a farthest string 9, and if 0, flip one uniformly random disagreement bit of the current center 1 toward 2 (Fischer et al., 29 May 2026). The analysis tracks the state 3 relative to a fixed optimum 4 and proves a progress probability
5
leading to an expected bound of at most 6 iterations and overall running time 7 (Fischer et al., 29 May 2026). With a reset rule triggered when the farthest distance exceeds 8, the polynomial factor can be improved to 9 (Fischer et al., 29 May 2026).
4. Structural special cases and small-number-of-strings regimes
A distinct exact line of work studies instances with a small number of input strings. In the Manhattan Sequence Consensus framework, which subsumes Binary Closest String, there is an 0-time exact algorithm for 1 sequences of length 2 (Kociumaka et al., 2014). For binary strings, this immediately yields an 3 algorithm for the minimax closest string problem with at most five inputs.
The method is column-based. For each position, the 4 input symbols are sorted, interval systems are defined over adjacent ranks, and the global objective is converted into a constrained ILP with a single radius variable (Kociumaka et al., 2014). Two algebraic reductions are then applied: sign normalization by negation of variables, and merging of variables with identical coefficient vectors by Minkowski summation of their ranges. For general 5, this reduces the number of variables to at most 6; for 7, the paper proves a much stronger combinatorial classification (Kociumaka et al., 2014).
For 8, every optimal sum-MSC sequence belongs to one of 20 families: five border families 9, five middle families 0, and ten triangle families 1 (Kociumaka et al., 2014). Each family induces an interval system whose ILP reduces to an “easy ILP” with at most four variables, solvable in 2 amortized time after linear-time preprocessing over columns (Kociumaka et al., 2014). The overall algorithm constructs the 20 candidates, solves their reduced ILPs, and selects the minimum radius.
The same paper gives a kernelization for general parameter 3. By merging columns with identical permutation types, any instance reduces in linear time to length at most 4, and in the binary case, with appropriate tie-breaking, to length at most 5 (Kociumaka et al., 2014). This shows fixed-parameter tractability in the number of strings, although the resulting generic exact algorithm is considered impractical beyond very small 6 because naive enumeration over interval systems remains too large (Kociumaka et al., 2014).
This small-7 regime is structurally different from the fine-grained large-8 regime. A plausible implication is that Binary Closest String supports two orthogonal exact methodologies: parameterization by the optimum radius, and parameterization by the number of strings through combinatorial column types.
5. Mathematical formulations and solver frameworks
Several exact and near-exact frameworks formulate Binary Closest String as IP, LP, CSP, or QUBO. These formulations emphasize different aspects of the problem: certification, propagation, practical heuristics, or hardware embedding.
An IP formulation introduces binary center variables and mismatch variables. In one-hot form, 9 indicates that position 0 of the center uses symbol 1, and the objective minimizes a radius variable 2 subject to one-hot constraints and distance upper bounds (0705.0561). In a binary specialization, one may instead use center bits 3 and mismatch variables 4 satisfying
5
for every input string 6 (0705.0561). LP relaxation followed by greedy iterative rounding yields a polynomial-time heuristic that is exact for two strings and has additive error at most one for three binary strings (0705.0561).
A CSP formulation is more explicitly combinatorial. It uses center variables 7, reified mismatch indicators 8, per-string distances 9, and a radius variable 00 (Kelsey et al., 2010). Two standard bounds are built in: the Hamming diameter upper bound
01
and the lower bound
02
from the triangle inequality (Kelsey et al., 2010). The paper’s principal heuristic is PWM ordering: positions are searched by decreasing majority count, and the majority bit is tried first. Reported experiments show “several orders of magnitude” speedups over generic orderings at and above the optimal distance, while the dominant residual cost is often certifying infeasibility at 03 (Kelsey et al., 2010). The same framework also supports enumeration of all optimal centers and distributed two-front search that combines optimization from above with infeasibility proofs from below (Kelsey et al., 2010).
A recurrent source of confusion is the distinction between the minimax objective and the sum-of-distances objective. The CSP work explicitly notes that minimizing the sum of Hamming distances is a different objective, outside its scope (Kelsey et al., 2010). The D-Wave annealing paper makes this distinction operationally important: its QUBO formulations optimize a sum-of-distances proxy, not the max-distance radius directly (Dissanayake, 2023). In that formulation, each column uses one-hot variables 04 selecting one observed input symbol for the center, with a clique penalty enforcing exactly one choice and a linear objective coefficient equal to the precomputed number of mismatches induced by that choice (Dissanayake, 2023). Because columns do not couple, the QUBO decomposes exactly across columns under that objective, enabling substring batching on Pegasus hardware; the evaluation metric is the Occurrence Ratio
05
and the paper reports recovery of expected solutions on small test cases with minimal hyperparameter tuning (Dissanayake, 2023).
6. Dynamic, approximation, and quantum extensions
Binary Closest String has also been studied in dynamic and quantum models, and these extensions expose which parts of the classical structure are robust under changing computational assumptions.
In the dynamic setting, one maintains a set of 06 binary strings of fixed length 07 under point updates 08 and answers feasibility queries for a fixed radius parameter 09 (Olkowski et al., 2022). The binary-specialized data structures have initialization time 10, amortized update time 11, and worst-case query time 12 (Olkowski et al., 2022). They maintain an approximate origin string 13 such that, if the instance is feasible, then 14 for every string 15; if some string ever exceeds distance 16 from 17, infeasibility is certified (Olkowski et al., 2022). Random color-coding of mismatch positions then supports a fast “far-pair” negativity check: if two strings are at Hamming distance greater than 18, the structure detects this with probability at least 19 (Olkowski et al., 2022). The resulting Monte Carlo guarantees have no false negatives and false positives of probability at most 20 after standard amplification (Olkowski et al., 2022).
Approximation remains theoretically delicate. The binary problem admits PTASes, but the lower bounds on their runtime dependence imply that high-accuracy approximation cannot be made strongly efficient in the EPTAS sense unless parameterized complexity collapses (Cygan et al., 2015). This suggests that, in binary Hamming space, approximation is useful primarily when the classical PTAS exponents are acceptable or when exact FPT algorithms in the optimum radius are inapplicable.
Quantum algorithms split into two main regimes. A nested Dürr–Høyer minimum/maximum search solves binary Closest String in time 21, and under a quantum analogue of SETH, the dominant 22 scaling cannot be improved to 23 for any 24 (Cudby et al., 17 Oct 2025). A second algorithm quantizes the bounded-search tree of Gramm et al. by an MNRS quantum walk and gives a binary Hamming runtime of 25 in the paper’s notation, improving the dominant dependence on the radius parameter from 26 to 27 relative to the classical branching backbone (Cudby et al., 17 Oct 2025). The same work also gives a quantum dynamic-programming algorithm for Levenshtein-distance variants and a quantum-accelerated generator search for binary Closest Substring (Cudby et al., 17 Oct 2025). However, the paper explicitly notes that for binary alphabets the best classical fixed-alphabet exact bounds 28 may outperform the walk-based quantum algorithm once 29 (Cudby et al., 17 Oct 2025).
Across these directions, the binary case is repeatedly special. Complementation equates Closest and Remotest String in the continuous setting; small alphabets change the parameter tradeoffs in both classical and quantum algorithms; and several lower bounds become strongest precisely over 30 (Abboud et al., 2023). The current exact picture is therefore unusually crisp: exhaustive search is conditionally optimal for the continuous binary problem, the discrete problem admits faster exact algorithms only outside the super-logarithmic hard range, and the radius-parameterized complexity is settled at randomized 31 time (Fischer et al., 29 May 2026).