Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sub-O(log n) Out-of-Order Sliding-Window Aggregation (1810.11308v1)

Published 26 Oct 2018 in cs.DS and cs.DB

Abstract: Sliding-window aggregation summarizes the most recent information in a data stream. Users specify how that summary is computed, usually as an associative binary operator because this is the most general known form for which it is possible to avoid naively scanning every window. For strictly in-order arrivals, there are algorithms with $O(1)$ time per window change assuming associative operators. Meanwhile, it is common in practice for streams to have data arriving slightly out of order, for instance, due to clock drifts or communication delays. Unfortunately, for out-of-order streams, one has to resort to latency-prone buffering or pay $O(\log n)$ time per insert or evict, where $n$ is the window size. This paper presents the design, analysis, and implementation of FiBA, a novel sliding-window aggregation algorithm with an amortized upper bound of $O(\log d)$ time per insert or evict, where $d$ is the distance of the inserted or evicted value to the closer end of the window. This means $O(1)$ time for in-order arrivals and nearly $O(1)$ time for slightly out-of-order arrivals, with a smooth transition towards $O(\log n)$ as $d$ approaches $n$. We also prove a matching lower bound on running time, showing optimality. Our algorithm is as general as the prior state-of-the-art: it requires associativity, but not invertibility nor commutativity. At the heart of the algorithm is a careful combination of finger-searching techniques, lazy rebalancing, and position-aware partial aggregates. We further show how to answer range queries that aggregate subwindows for window sharing. Finally, our experimental evaluation shows that FiBA performs well in practice and supports the theoretical findings.

Citations (2)

Summary

We haven't generated a summary for this paper yet.