Design, Configuration, Implementation, and Performance of a Simple 32 Core Raspberry Pi Cluster (1708.05264v3)
Abstract: In this report, I describe the design and implementation of an inexpensive, eight node, 32 core, cluster of raspberry pi single board computers, as well as the performance of this cluster on two computational tasks, one that requires significant data transfer relative to computational time requirements, and one that does not. We have two use-cases for the cluster: (a) as an educational tool for classroom usage, such as covering parallel algorithms in an algorithms course; and (b) as a test system for use during the development of parallel metaheuristics, essentially serving as a personal desktop parallel computing cluster. Our preliminary results show that the slow 100 Mbps networking of the raspberry pi significantly limits such clusters to parallel computational tasks that are either long running relative to data communications requirements, or that which requires very little internode communications. Additionally, although the raspberry pi 3 has a quad-core processor, parallel speedup degrades during attempts to utilize all four cores of all cluster nodes for a parallel computation, likely due to resource contention with operating system level processes. However, distributing a task across three cores of each cluster node does enable linear (or near linear) speedup.
- Budget beowulfs: A showcase of inexpensive clusters for teaching pdc. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education, pages 344–345. ACM, 2015.
- The micro-cluster showcase: 7 inexpensive beowulf clusters for teaching pdc. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education, pages 82–83. ACM, 2016.
- Communication-optimal parallel algorithm for strassen’s matrix multiplication. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pages 193–204. ACM, 2012.
- Recursive array layouts and fast parallel matrix multiplication. In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, pages 222–231. ACM, 1999.
- Vincent A. Cicirello. Performance tests for small clusters. GitHub, August 2017a. Source code repository: https://github.com/cicirello/ClusterPerformanceTests.
- Vincent A. Cicirello. Variable annealing length and parallelism in simulated annealing. In Proceedings of the Tenth International Symposium on Combinatorial Search (SoCS 2017), pages 2–10. AAAI Press, June 2017b. 10.1609/socs.v8i1.18424.
- Introduction to Algorithms. MIT Press, 2009.
- Lesslie Hall, editor. Beowulf. D.C. Heath and Co., 1892. English translation.
- Matrix multiplication, a little faster. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pages 101–110. ACM, 2017.
- Suzanne J. Matthews. Teaching with parallella: A first look in an undergraduate parallel computing course. Journal of Computing Sciences in Colleges, 31(3):18–27, January 2016.
- David Neal. Determining sample sizes for monte carlo integration. The College Mathematics Journal, 24(3):254–259, 1993.
- Raspberry Pi Foundation. Raspberry pi: Teach, learn, and make with raspberry pi. Website, 2017. https://www.raspberrypi.org/.
- BEOWULF: A parallel workstation for scientific computation. In Proceedings of the 1995 International Conference on Parallel Processing, pages 11–14, 1995.
- Volker Strassen. Gaussian elimination is not optimal. Numer. Math., 13(4):354–356, 1969.