Efficiently Computing Edit Distance to Dyck Language (1311.2557v2)
Abstract: Given a string $\sigma$ over alphabet $\Sigma$ and a grammar $G$ defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map $\sigma$ into a valid member of $G$ ? We investigate this basic question in this paper for $Dyck(s)$. $Dyck(s)$ is a fundamental context free grammar representing the language of well-balanced parentheses with s different types of parentheses and has played a pivotal role in the development of theory of context free languages. Computing edit distance to $Dyck(s)$ significantly generalizes string edit distance problem and has numerous applications ranging from repairing semi-structured documents such as XML to memory checking, automated compiler optimization, natural language processing etc. In this paper we give the first near-linear time algorithm for edit distance computation to $Dyck(s)$ that achieves a nontrivial approximation factor of $O(\frac{1}{\epsilon}\log{OPT}(\log{n}){\frac{1}{\epsilon}})$ in $O(n{1+\epsilon}\log{n})$ time. In fact, given there exists an algorithm for computing string edit distance on input of size $n$ in $\alpha(n)$ time with $\beta(n)$-approximation factor, we can devise an algorithm for edit distance problem to $Dyck(s)$ running in $\tilde{O}(n{1+\epsilon}+\alpha(n))$ and achieving an approximation factor of $O(\frac{1}{\epsilon}\beta(n)\log{OPT})$. We show that the framework for efficiently approximating edit distance to $Dyck(s)$ can be applied to many other languages. We illustrate this by considering various memory checking languages which comprise of valid transcripts of stacks, queues, priority queues, double-ended queues etc. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.