Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory (2404.03030v1)
Abstract: This paper describes a distributed implementation of Apache Arrow that can leverage cluster-shared load-store addressable memory that is hardware-coherent only within each node. The implementation is built on the ThymesisFlow prototype that leverages the OpenCAPI interface to create a shared address space across a cluster. While Apache Arrow structures are immutable, simplifying their use in a cluster shared memory, this paper creates distributed Apache Arrow tables and makes them accessible in each node.
- The Apache Software Foundation [n. d.]. Apache Arrow. The Apache Software Foundation. https://arrow.apache.org/
- Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE Computer Society, Los Alamitos, CA, USA, 1–7. https://doi.org/10.1109/IPDPSW55747.2022.00211
- ArrowSAM: In-Memory Genomics Data Processing Using Apache Arrow. In 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS). 1–6. https://doi.org/10.1109/ICCAIS48893.2020.9096725
- Hasan Al Maruf and Mosharaf Chowdhury. 2023. Memory disaggregation: advances and open challenges. ACM SIGOPS Operating Systems Review 57, 1 (2023), 29–37.
- Wenqi Cao and Ling Liu. 2018. Dynamic and Transparent Memory Sharing for Accelerating Big Data Analytics Workloads in Virtualized Cloud. In 2018 IEEE International Conference on Big Data (Big Data). 191–200. https://doi.org/10.1109/BigData.2018.8621991
- Compute Express Link Consortium. 2023. CXL Consortium and OpenCAPI Consortium Sign Letter of Intent to Transfer OpenCAPI Specifications to CXL. https://computeexpresslink.org/wp-content/uploads/2024/01/OCC_CXL-Announcement_FINAL.pdf.
- ABS group. 2023. Zero-Copy, Zero-Serialization Memory Disaggregation using Apache Arrow and ThymesisFlow. https://github.com/abs-tudelft/memory-disaggregation-ThymesisFlow-Arrow.
- FPGA Acceleration for Big Data Analytics: Challenges and Opportunities. IEEE Circuits and Systems Magazine 21, 2 (2021), 30–47. https://doi.org/10.1109/MCAS.2021.3071608
- Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. 2 ([n. d.]). Issue 23. https://doi.org/10.1145/3575693.3578835
- System-level implications of disaggregated memory. In IEEE International Symposium on High-Performance Comp Architecture. IEEE, 1–12.
- Pushing big data into accelerators: Can the JVM saturate our hardware?. In High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^ 3MA, VHPC, Visualization at Scale, WOPSSS, Frankfurt, Germany, June 18-22, 2017, Revised Selected Papers 32. Springer, 220–236.
- Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA. In 2020 International Conference on Field-Programmable Technology (ICFPT). 281–286. https://doi.org/10.1109/ICFPT51103.2020.00048
- Thymesisflow: A software-defined, HW/SW co-designed interconnect stack for rack-scale memory disaggregation. Proceedings of the Annual International Symposium on Microarchitecture, MICRO 2020-October, 868–880. https://doi.org/10.1109/MICRO50266.2020.00075 Explanation of the ThymesisFlow framework. OpenCAPI with a software stack allowing for pooling of remote system memory..
- John Russell. 2020. IBM debuts Power10; touts new memory scheme, security, and inferencing. https://www.enterpriseai.news/2020/08/18/ibm-debuts-power10-touts-new-memory-scheme-security-and-inferencing/
- Debendra Das Sharma. 2019. Compute express link. CXL Consortium White Paper (2019).
- Dimitris Syrivelis. [n. d.]. OpenPOWER Summit NA 2019: Thymesis-P: An Approach to Rack-scale Disaggregation Over OpenCAPI. OpenPower Foundation. https://www.youtube.com/watch?v=XcjRL3Lh8Ig
- Borg: The next Generation. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 30, 14 pages. https://doi.org/10.1145/3342195.3387517
- Performance Evaluation on CXL-enabled Hybrid Memory Pool. 2022 IEEE International Conference on Networking, Architecture and Storage (NAS), 1–5. https://doi.org/10.1109/NAS55553.2022.9925356