Constructing Antidictionaries in Output-Sensitive Space

Published 13 Feb 2019 in cs.DS | (1902.04785v1)

Abstract: A word $x$ that is absent from a word $y$ is called minimal if all its proper factors occur in $y$. Given a collection of $k$ words $y_1,y_2,\ldots,y_k$ over an alphabet $\Sigma$, we are asked to compute the set $\mathrm{M}^{{\ell}{y{1}#\ldots#y_{k}}$} of minimal absent words of length at most $\ell$ of word $y=y_1#y_2#\ldots#y_k$, $#\notin\Sigma$. In data compression, this corresponds to computing the antidictionary of $k$ documents. In bioinformatics, it corresponds to computing words that are absent from a genome of $k$ chromosomes. This computation generally requires $\Omega(n)$ space for $n=|y|$ using any of the plenty available $\mathcal{O}(n)$-time algorithms. This is because an $\Omega(n)$-sized text index is constructed over $y$ which can be impractical for large $n$. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when $||\mathrm{M}^{{\ell}{y{1}#\ldots#y_{N}}||=o(n)$,} for all $N\in[1,k]$. For instance, in the human genome, $n \approx 3\times 10^9$ but $||\mathrm{M}^{{12}{y{1}#\ldots#y_{k}}||} \approx 10^6$. We consider a constant-sized alphabet for stating our results. We show that all $\mathrm{M}^{{\ell}{y{1}},\ldots,\mathrm{M}^{{\ell}{y{1}#\ldots#y_{k}}$}} can be computed in $\mathcal{O}(kn+\sum^{{k}{N=1}||\mathrm{M}^{\ell}{y_{1}#\ldots#y_{N}}||)$} total time using $\mathcal{O}(\mathrm{MaxIn}+\mathrm{MaxOut})$ space, where $\mathrm{MaxIn}$ is the length of the longest word in ${y_1,\ldots,y_{k}}$ and $\mathrm{MaxOut}=\max{||\mathrm{M}^{{\ell}{y{1}#\ldots#y_{N}}||:N\in[1,k]}$.} Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.

Abstract PDF Upgrade to Chat

Citations (7)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Constructing Antidictionaries in Output-Sensitive Space

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (5)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Constructing Antidictionaries in Output-Sensitive Space

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research