Papers
Topics
Authors
Recent
2000 character limit reached

Improved Extended Regular Expression Matching (2510.09311v1)

Published 10 Oct 2025 in cs.DS

Abstract: An extended regular expression $R$ specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression $R$ and a string $Q$, the extended regular expression matching problem is to decide if $Q$ matches any of the strings specified by $R$. Extended regular expressions are a basic concept in formal language theory and a basic primitive for searching and processing data. Extended regular expression matching was introduced by Hopcroft and Ullmann in the 1970s [\textit{Introduction to Automata Theory, Languages and Computation}, 1979], who gave a simple dynamic programming solution using $O(n3m)$ time and $O(n2m)$ space, where $n$ is the length of $Q$ and $m$ is the length of $R$. Since then, several solutions have been proposed, but few significant asymptotic improvements have been obtained. The current state-of-the art solution, by Yamamoto and Miyazaki~[COCOON, 2003], uses $O(\frac{n3k + n2m}{w} + n + m)$ time and $O(\frac{n2k + nm}{w} + n + m)$ space, where $k$ is the number of negation and complement operators in $R$ and $w$ is the number of bits in a word. This roughly replaces the $m$ factor with $k$ in the dominant terms of both the space and time bounds of the Hopcroft and Ullmann algorithm. We revisit the problem and present a new solution that significantly improves the previous time and space bounds. Our main result is a new algorithm that solves extended regular expression matching in [O\left(n\omega k + \frac{n2m}{\min(w/\log w, \log n)} + m\right)] time and $O(\frac{n2 \log k}{w} + n + m) = O(n2 +m)$ space, where $\omega \approx 2.3716$ is the exponent of matrix multiplication. Essentially, this replaces the dominant $n3k$ term with $n\omega k$ in the time bound, while simultaneously improving the $n2k$ term in the space to $O(n2)$.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.