Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Time-Evolving Stream Processing at Scale (1806.00760v1)

Published 3 Jun 2018 in cs.DC

Abstract: Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these time-evolving datasets while preserving low memory overhead. In this paper, we present a novel grouping approach (named FISH), which can provide the efficient time-evolving stream processing at scale. The key insight of this work is that the keys of time-evolving stream data can have a skewed distribution within any bounded distance of time interval. This enables to accurately identify the recent hot keys for the real-time load balance within a bounded scope. We therefore propose an epoch-based recent hot key identification with specialized intra-epoch frequency counting (for maintaining low memory overhead) and inter-epoch hotness decaying (for suppressing superfluous computation). We also propose to heuristically infer the accurate information of remote workers through computation rather than communication for cost-efficient worker assignment. We have integrated our approach into Apache Storm. Our results on a cluster of 128 nodes for both synthetic and real-world stream datasets show that FISH significantly outperforms state-of-the-art with the average and the 99th percentile latency reduction by 87.12% and 76.34% (vs. W-Choices), and memory overhead reduction by 99.96% (vs. Shuffle Grouping).

Citations (5)

Summary

We haven't generated a summary for this paper yet.