Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

A framework for space-efficient string kernels (1502.06370v1)

Published 23 Feb 2015 in cs.DS

Abstract: String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the $k$-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in $O(nd)$ time and in $o(n)$ bits of space in addition to the input, using just a $\mathtt{rangeDistinct}$ data structure on the Burrows-Wheeler transform of the input strings, which takes $O(d)$ time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of $k$, like the $k$-mer profile and the $k$-th order empirical entropy, and for calibrating the value of $k$ using the data.

Citations (17)

Summary

We haven't generated a summary for this paper yet.