Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 129 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Efficient Fault Tolerance for Pipelined Query Engines via Write-ahead Lineage (2403.08062v1)

Published 12 Mar 2024 in cs.DC and cs.DB

Abstract: Modern distributed pipelined query engines either do not support intra-query fault tolerance or employ high-overhead approaches such as persisting intermediate outputs or checkpointing state. In this work, we present write-ahead lineage, a novel fault recovery technique that combines Spark's lineage-based replay and write-ahead logging. Unlike Spark, where the lineage is determined before query execution, write-ahead lineage persistently logs lineage at runtime to support dynamic task dependencies in pipelined query engines. Since only KB-sized lineages are persisted instead of MB-sized intermediate outputs, the normal execution overhead is minimal compared to spooling or checkpointing based approaches. To ensure fast fault recovery times, tasks only consume intermediate outputs with persisted lineage, preventing global rollbacks upon failure. In addition, lost tasks from different stages can be recovered in a pipelined parallel manner. We implement write-ahead lineage in a distributed pipelined query engine called Quokka. We show that Quokka is around 2x faster than SparkSQL on the TPC-H benchmark with similar fault recovery performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi et al., “Spark sql: Relational data processing in spark,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data, 2015, pp. 1383–1394.
  2. S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis, “Dremel: interactive analysis of web-scale datasets,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 330–339, 2010.
  3. R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yegitbasi, H. Jin, E. Hwang, N. Shingte et al., “Presto: Sql on everything,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE).   IEEE, 2019, pp. 1802–1813.
  4. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
  5. M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin et al., “Apache spark: a unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016.
  6. M. Shen, Y. Zhou, and C. Singh, “Magnet: push-based shuffle service for large-scale data processing,” Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 3382–3395, 2020.
  7. B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang et al., “The snowflake elastic data warehouse,” in Proceedings of the 2016 International Conference on Management of Data, 2016, pp. 215–226.
  8. N. Armenatzoglou, S. Basu, N. Bhanoori, M. Cai, N. Chainani, K. Chinta, V. Govindaraju, T. J. Green, M. Gupta, S. Hillig et al., “Amazon redshift re-invented,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 2205–2217.
  9. V. Leis, P. Boncz, A. Kemper, and T. Neumann, “Morsel-driven parallelism: a numa-aware query evaluation framework for the many-core age,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data, 2014, pp. 743–754.
  10. “Trino fault tolerance,” https://github.com/trinodb/trino/wiki/Fault-Tolerant-Execution.
  11. J. Kreps, N. Narkhede, J. Rao et al., “Kafka: A distributed messaging system for log processing,” in Proceedings of the NetDB, vol. 11, 2011, pp. 1–7.
  12. P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 36, no. 4, 2015.
  13. W. Lin, Z. Qian, J. Xu, S. Yang, J. Zhou, and L. Zhou, “{{\{{StreamScope}}\}}: Continuous reliable distributed processing of big data streams,” in 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), 2016, pp. 439–453.
  14. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A {{\{{Fault-Tolerant}}\}} abstraction for {{\{{In-Memory}}\}} cluster computing,” in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), 2012, pp. 15–28.
  15. A. Shaikhha, M. Dashti, and C. Koch, “Push versus pull-based loop fusion in query engines,” Journal of Functional Programming, vol. 28, p. e10, 2018.
  16. T. Neumann, “Efficiently compiling efficient query plans for modern hardware,” Proceedings of the VLDB Endowment, vol. 4, no. 9, pp. 539–550, 2011.
  17. S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki, “Qpipe: A simultaneously pipelined relational query engine,” in Proceedings of the 2005 ACM SIGMOD international conference on Management of data, 2005, pp. 383–394.
  18. P. A. Boncz, M. Zukowski, and N. Nes, “Monetdb/x100: Hyper-pipelining query execution.” in Cidr, vol. 5, 2005, pp. 225–237.
  19. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi, “Naiad: a timely dataflow system,” in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pp. 439–455.
  20. T. Akidau, A. Balikov, K. Bekiroğlu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle, “Millwheel: Fault-tolerant stream processing at internet scale,” Proceedings of the VLDB Endowment, vol. 6, no. 11, pp. 1033–1044, 2013.
  21. I. Gog, M. Isard, and M. Abadi, “Falkirk wheel: Rollback recovery for dataflow systems,” in Proceedings of the ACM Symposium on Cloud Computing, 2021, pp. 373–387.
  22. P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., “Ray: A distributed framework for emerging {{\{{AI}}\}} applications,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 561–577.
  23. S. Zhuang, S. Wang, E. Liang, Y. Cheng, and I. Stoica, “{{\{{ExoFlow}}\}}: A universal workflow system for {{\{{Exactly-Once}}\}}{{\{{DAGs}}\}},” in 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), 2023, pp. 269–286.
  24. “An overview of end-to-end exactly-once processing in apache flink,” https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html.
  25. S. Wang, J. Liagouris, R. Nishihara, P. Moritz, U. Misra, A. Tumanov, and I. Stoica, “Lineage stash: fault tolerance off the critical path,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 338–352.
  26. S. Wang, E. Liang, E. Oakes, B. Hindman, F. S. Luan, A. Cheng, and I. Stoica, “Ownership: A distributed futures system for {{\{{Fine-Grained}}\}} tasks,” in 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), 2021, pp. 671–686.
  27. M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes, “Building an elastic query engine on disaggregated storage,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 449–462.
  28. M. Raasveldt and H. Mühleisen, “Duckdb: an embeddable analytical database,” in Proceedings of the 2019 International Conference on Management of Data, 2019, pp. 1981–1984.
  29. “Polars,” https://github.com/pola-rs/polars.
  30. “Arrow flight,” https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/.
  31. B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, omega, and kubernetes,” Communications of the ACM, vol. 59, no. 5, pp. 50–57, 2016.
  32. “Spark enhancements for elasticity and resiliency on amazon emr,” https://aws.amazon.com/blogs/big-data/spark-enhancements-for-elasticity-and-resiliency-on-amazon-emr/.
  33. A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, and C. Bear, “The vertica analytic database: C-store 7 years later,” arXiv preprint arXiv:1208.4173, 2012.
  34. “Mythbusting snowflake pricing! all the cool stuff you get with 1 credit,” https://medium.com/snowflake/mythbusting-snowflake-pricing-all-the-cool-stuff-you-get-with-1-credit-f3daad217a98.
  35. “Vector data lakes,” Data + AI Summit by Databricks, 2023, [Accessed: 29-November-2023]. [Online]. Available: https://www.databricks.com/dataaisummit/session/vector-data-lakes/
  36. J. Zhou, M. Xu, A. Shraer, B. Namasivayam, A. Miller, E. Tschannen, S. Atherton, A. J. Beamon, R. Sears, J. Leach et al., “Foundationdb: A distributed unbundled transactional key value store,” in Proceedings of the 2021 International Conference on Management of Data, 2021, pp. 2653–2666.
  37. S. Sivasubramanian, “Amazon dynamodb: a seamlessly scalable non-relational database service,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012, pp. 729–730.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: