Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval (1108.0554v2)

Published 2 Aug 2011 in cs.DS

Abstract: Let $\D = $$ {d_1,d_2,...d_D}$ be a given set of $D$ string documents of total length $n$, our task is to index $\D$, such that the $k$ most relevant documents for an online query pattern $P$ of length $p$ can be retrieved efficiently. We propose an index of size $|CSA|+n\log D(2+o(1))$ bits and $O(t_{s}(p)+k\log\log n+poly\log\log n)$ query time for the basic relevance metric \emph{term-frequency}, where $|CSA|$ is the size (in bits) of a compressed full text index of $\D$, with $O(t_s(p))$ time for searching a pattern of length $p$ . We further reduce the space to $|CSA|+n\log D(1+o(1))$ bits, however the query time will be $O(t_s(p)+k(\log \sigma \log\log n){1+\epsilon}+poly\log\log n)$, where $\sigma$ is the alphabet size and $\epsilon >0$ is any constant.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Wing-Kai Hon (16 papers)
  2. Rahul Shah (17 papers)
  3. Sharma V. Thankachan (15 papers)
Citations (27)

Summary

We haven't generated a summary for this paper yet.