Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying Dwarfs Workloads in Big Data Analytics (1505.06872v1)

Published 26 May 2015 in cs.DB

Abstract: Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data analytics workloads? Big data dwarfs are abstractions of extracting frequently appearing operations in big data computing. One dwarf represents one unit of computation, and big data workloads are decomposed into one or more dwarfs. Furthermore, dwarfs workloads rather than vast real workloads are more cost-efficient and representative to evaluate big data systems. In this paper, we extensively investigate six most important or emerging application domains i.e. search engine, social network, e-commerce, multimedia, bioinformatics and astronomy. After analyzing forty representative algorithms, we single out eight dwarfs workloads in big data analytics other than OLAP, which are linear algebra, sampling, logic operations, transform operations, set operations, graph operations, statistic operations and sort.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Wanling Gao (47 papers)
  2. Chunjie Luo (39 papers)
  3. Jianfeng Zhan (92 papers)
  4. Hainan Ye (14 papers)
  5. Xiwen He (8 papers)
  6. Lei Wang (975 papers)
  7. Yuqing Zhu (34 papers)
  8. Xinhui Tian (5 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.