Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 90 tok/s

Gemini 2.5 Pro 29 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 195 tok/s Pro

GPT OSS 120B 456 tok/s Pro

Claude Sonnet 4 39 tok/s Pro

2000 character limit reached

Using Positional Sequence Patterns to Estimate the Selectivity of SQL LIKE Queries (2002.01164v1)

Published 4 Feb 2020 in cs.DB and cs.DS

Abstract: With the dramatic increase in the amount of the text-based data which commonly contains misspellings and other errors, querying such data with flexible search patterns becomes more and more commonplace. Relational databases support the LIKE operator to allow searching with a particular wildcard predicate (e.g., LIKE 'Sub%', which matches all strings starting with 'Sub'). Due to the large size of text data, executing such queries in the most optimal way is quite critical for database performance. While building the most efficient execution plan for a LIKE query, the query optimizer requires the selectivity estimate for the flexible pattern-based query predicate. Recently, SPH algorithm is proposed which employs a sequence pattern-based histogram structure to estimate the selectivity of LIKE queries. A drawback of the SPH approach is that it often overestimates the selectivity of queries. In order to alleviate the overestimation problem, in this paper, we propose a novel sequence pattern type, called positional sequence patterns. The proposed patterns differentiate between sequence item pairs that appear next to each other in all pattern occurrences from those that may have other items between them. Besides, we employ redundant pattern elimination based on pattern information content during histogram construction. Finally, we propose a partitioning-based matching scheme during the selectivity estimation. The experimental results on a real dataset from DBLP show that the proposed approach outperforms the state of the art by around 20% improvement in error rates.

Citations (1)

View on Semantic Scholar