Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) (1412.0348v4)

Published 1 Dec 2014 in cs.CC and cs.DS

Abstract: The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time. In this paper we provide evidence that the near-quadratic running time bounds known for the problem of computing edit distance might be tight. Specifically, we show that, if the edit distance can be computed in time $O(n^{2-\delta})$ for some constant $\delta>0$, then the satisfiability of conjunctive normal form formulas with $N$ variables and $M$ clauses can be solved in time $M^{O(1)} 2^{{(1-\epsilon)N}$} for a constant $\epsilon>0$. The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist.

Citations (392)

View on Semantic Scholar

Summary

The paper shows that any strongly subquadratic algorithm for edit distance would refute SETH via a reduction from the Orthogonal Vectors Problem.
It introduces innovative vector gadget constructions that translate vector orthogonality into distinguishable edit distance outcomes.
The findings imply that while exact subquadratic solutions are unlikely, approximate algorithms may offer practical alternatives for large-scale applications.

Edit Distance Complexity and SETH

The determination of edit distance, often termed Levenshtein distance, stands as a cornerstone problem in theoretical computer science. This computation measures the minimal modifications required—insertions, deletions, or substitutions—to transform one string into another. Despite its foundational nature and extensive applications, notably in computational biology and natural language processing, computing edit distance remains computationally expensive, with the fastest known algorithms operating in nearly quadratic time.

In their work, Backurs and Indyk present substantial evidence suggesting that the existing upper bounds on the time complexity of computing edit distance might be near-optimal. They assert that achieving a truly sub-quadratic algorithm for this problem would contradict the Strong Exponential Time Hypothesis (SETH). SETH asserts that there is no algorithm capable of solving the satisfiability (SAT) problem in "strongly sub-exponential" time for CNF formulas—i.e., solving them faster than $2^{(1-\epsilon)N}$ for any constant $\epsilon > 0$ .

The paper builds on a reduction from the Orthogonal Vectors Problem (OVP), for which a sub-quadratic solution implies a collapse of SETH. Specifically, they propose that any algorithm for edit distance operating in time $O(n^{2-\delta})$ , for some constant $\delta > 0$ , would also provide an OVP solution within $O(N^{2-\delta})$ . Here, N is the number of vectors from these sets, demonstrating a proxy transference of complexity from the CNF-SAT to edit distance computation.

Key Contributions and Techniques

Reduction from Orthogonal Vectors Problem:
- The central technique involves the construction of 'vector gadgets' that encapsulate characteristics of vector orthogonality into sequences. These gadgets convert vectors into strings, enabling an indirect evaluation of whether a vector pair is orthogonal through edit distance computation.
- The authors adeptly create boundaries through this translation, establishing distinguishable cases where orthogonal pairs yield minimal edit distances, while non-orthogonal pairs incur greater computational penalties.
Analysis via SETH-Hardness:
- SETH provides a robust foundation for establishing the edit distance problem as "SETH-hard." This categorization links the complexity of edit distance to a broader array of computational problems recognized as intractable under present algorithmic paradigms.
Implications for Approximation Algorithms:
- While the paper posits strong barriers against exact sub-quadratic computation, the results tantalizingly hint that approximation algorithms—at least for large alphabets or certain alphabet constraints—could escape these bounds, a topic deserving further exploration.

Implications and Future Work

Backurs and Indyk’s paper firmly situates edit distance computation among the family of computationally hard problems contingent on the validity of SETH. This alignment delineates explicit bounds for future algorithmic explorations. Researchers developing approximate algorithms must now be organized, synthesizing techniques from approximation theory, probabilistic methods, and leveraging domain-specific constraints where applicable.

From a practical perspective, these insights urge the field to direct attention toward heuristic approaches or domain-specific optimizations. For example, bioinformatics workflows may tolerate approximation if this yields significant computational speedups without substantial accuracy loss.

In conclusion, the robust reduction crafted by Backurs and Indyk not only elucidates why edit distance resists efficient computation but also accentuates the theoretical underpinnings interlinking diverse areas of complexity theory. This work does not decisively close doors but rather illuminates pathways for strategic exploration beyond the constraints of current hypotheses. Future advancements in edit distance must strategically navigate this landscape, particularly where exact calculations remain impractical.

PDF Markdown

Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) (1412.0348v4)

Summary

Edit Distance Complexity and SETH

Related Papers