Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
86 tokens/sec
Gemini 2.5 Pro Premium
43 tokens/sec
GPT-5 Medium
19 tokens/sec
GPT-5 High Premium
30 tokens/sec
GPT-4o
93 tokens/sec
DeepSeek R1 via Azure Premium
88 tokens/sec
GPT OSS 120B via Groq Premium
441 tokens/sec
Kimi K2 via Groq Premium
234 tokens/sec
2000 character limit reached

Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory (2404.03030v1)

Published 3 Apr 2024 in cs.ET

Abstract: This paper describes a distributed implementation of Apache Arrow that can leverage cluster-shared load-store addressable memory that is hardware-coherent only within each node. The implementation is built on the ThymesisFlow prototype that leverages the OpenCAPI interface to create a shared address space across a cluster. While Apache Arrow structures are immutable, simplifying their use in a cluster shared memory, this paper creates distributed Apache Arrow tables and makes them accessible in each node.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. The Apache Software Foundation [n. d.]. Apache Arrow. The Apache Software Foundation. https://arrow.apache.org/
  2. Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE Computer Society, Los Alamitos, CA, USA, 1–7. https://doi.org/10.1109/IPDPSW55747.2022.00211
  3. ArrowSAM: In-Memory Genomics Data Processing Using Apache Arrow. In 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS). 1–6. https://doi.org/10.1109/ICCAIS48893.2020.9096725
  4. Hasan Al Maruf and Mosharaf Chowdhury. 2023. Memory disaggregation: advances and open challenges. ACM SIGOPS Operating Systems Review 57, 1 (2023), 29–37.
  5. Wenqi Cao and Ling Liu. 2018. Dynamic and Transparent Memory Sharing for Accelerating Big Data Analytics Workloads in Virtualized Cloud. In 2018 IEEE International Conference on Big Data (Big Data). 191–200. https://doi.org/10.1109/BigData.2018.8621991
  6. Compute Express Link Consortium. 2023. CXL Consortium and OpenCAPI Consortium Sign Letter of Intent to Transfer OpenCAPI Specifications to CXL. https://computeexpresslink.org/wp-content/uploads/2024/01/OCC_CXL-Announcement_FINAL.pdf.
  7. ABS group. 2023. Zero-Copy, Zero-Serialization Memory Disaggregation using Apache Arrow and ThymesisFlow. https://github.com/abs-tudelft/memory-disaggregation-ThymesisFlow-Arrow.
  8. FPGA Acceleration for Big Data Analytics: Challenges and Opportunities. IEEE Circuits and Systems Magazine 21, 2 (2021), 30–47. https://doi.org/10.1109/MCAS.2021.3071608
  9. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. 2 ([n. d.]). Issue 23. https://doi.org/10.1145/3575693.3578835
  10. System-level implications of disaggregated memory. In IEEE International Symposium on High-Performance Comp Architecture. IEEE, 1–12.
  11. Pushing big data into accelerators: Can the JVM saturate our hardware?. In High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^ 3MA, VHPC, Visualization at Scale, WOPSSS, Frankfurt, Germany, June 18-22, 2017, Revised Selected Papers 32. Springer, 220–236.
  12. Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA. In 2020 International Conference on Field-Programmable Technology (ICFPT). 281–286. https://doi.org/10.1109/ICFPT51103.2020.00048
  13. Thymesisflow: A software-defined, HW/SW co-designed interconnect stack for rack-scale memory disaggregation. Proceedings of the Annual International Symposium on Microarchitecture, MICRO 2020-October, 868–880. https://doi.org/10.1109/MICRO50266.2020.00075 Explanation of the ThymesisFlow framework. OpenCAPI with a software stack allowing for pooling of remote system memory..
  14. John Russell. 2020. IBM debuts Power10; touts new memory scheme, security, and inferencing. https://www.enterpriseai.news/2020/08/18/ibm-debuts-power10-touts-new-memory-scheme-security-and-inferencing/
  15. Debendra Das Sharma. 2019. Compute express link. CXL Consortium White Paper (2019).
  16. Dimitris Syrivelis. [n. d.]. OpenPOWER Summit NA 2019: Thymesis-P: An Approach to Rack-scale Disaggregation Over OpenCAPI. OpenPower Foundation. https://www.youtube.com/watch?v=XcjRL3Lh8Ig
  17. Borg: The next Generation. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 30, 14 pages. https://doi.org/10.1145/3342195.3387517
  18. Performance Evaluation on CXL-enabled Hybrid Memory Pool. 2022 IEEE International Conference on Networking, Architecture and Storage (NAS), 1–5. https://doi.org/10.1109/NAS55553.2022.9925356

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube