Cronos Cluster: ARM-based HPC Platform

Updated 15 December 2025

Cronos Cluster is a low-cost, ARM-based computing platform built with Raspberry Pi devices for educational and experimental HPC benchmarking.
The system leverages Slurm, Open MPI, and HPL benchmarks to assess performance and energy efficiency through both inter-node and intra-node parallelism.
Research indicates that homogeneous deployments using Raspberry Pi 4 deliver superior stability and efficiency compared to mixed configurations with Raspberry Pi 3B.

The Cronos cluster is a low-cost computing platform constructed from Raspberry Pi 4 and Raspberry Pi 3B microcomputers, specifically architected for educational and experimental research environments. Its principal focus is on the evaluation of computational performance and energy efficiency using standardized high-performance computing (HPC) benchmarks. The system typifies a class of ARM-based clusters, demonstrating feasible approaches for distributed computation, resource management, and quantitative assessment of power-performance characteristics in compact, economically accessible settings (Semken et al., 8 Dec 2025).

1. Architecture and Hardware Composition

Cronos incorporates eight single-board computers organized within a custom chassis and powered by a unified 5 V switching PSU (8 A total capacity). The node breakdown is as follows:

Raspberry Pi 4 (6 nodes):
- CPU: Broadcom BCM2711, ARM Cortex-A72 (quad-core, 1.5 GHz)
- Memory: 4 GB LPDDR4-3200
- Network: Gigabit Ethernet (over USB 2.0)
- Wireless: 802.11ac Wi-Fi, Bluetooth 5.0
- Storage: microSD (class 10), OS image
Raspberry Pi 3B (2 nodes):
- CPU: Broadcom BCM2837, ARM Cortex-A53 (quad-core, 1.2 GHz)
- Memory: 1 GB LPDDR2
- Network: 10/100 Ethernet
- Wireless: 802.11n Wi-Fi, Bluetooth 4.1
- Storage: microSD (class 10)

The cluster utilizes a central unmanaged Gigabit switch in a star network topology. Node home directories and HPL benchmark data are exported by a dedicated NFS server. Power consumption is monitored by a clamp meter (“pinza amperimétrica”) at the 5 V bus; current samples are acquired every 5 min during typical 35 min runs, and energy totals are integrated using the trapezoidal rule.

2. Software Stack and Resource Management

The Cronos platform runs Raspberry Pi OS Lite (Debian Buster-derived) with Linux kernel 5.x. Cluster management and job scheduling are provided by Slurm (v20.x–21.x), utilizing Munge for intra-cluster authentication. Slurm is configured with the following parameters:

ProctrackType=proctrack/pgid
SelectType=select/cons_res
DefMemPerCPU=1000M
Distinct definitions for rpi4[1–6] and rpi3[7–8] nodes

Parallel job execution leverages Open MPI (v4.1.x, compiled with GCC 9.x and TCP/IP btl). Computational kernels use OpenBLAS (v0.3.x) for optimized BLAS routines, with High Performance Linpack (HPL 2.3) as the principal benchmark workload. HPL input files (HPL.dat) are specifically adapted for ARM instruction sets. System state (CPU load/thermal readings) is tracked via Ganglia; power metrics are logged with custom scripts.

3. Benchmarking Procedures

Benchmarking is performed using the HPL benchmark in double precision arithmetic. Key parameters include:

Matrix dimension ( $N$ ): 2,800–3,000
Block size ( $NB$ ): 224
Process grid ( $P\times Q$ ): 3×2 (homogeneous), 8×1 (heterogeneous case)

Two canonical Slurm job scripts are defined:

Case A (Inter-node parallelism): --ntasks-per-node=1 leading to 6 MPI ranks over 6 nodes
Case B (Intra-node parallelism): --ntasks-per-node=4 yielding 24 ranks on 6 nodes

Each configuration is executed three times, and results are summarized using mean ( $\bar{x}$ ) and sample standard deviation ( $s$ ).

4. Performance Analysis

Homogeneous Configuration (6×RPi 4):

Configuration	N	NB	$P\times Q$	Time [s]	GFLOPS
1 task/node (6 MPI ranks)	2872	224	3×2	2872.6	6.2667
	2889	224	3×2	2889.5	6.2299
	2946	224	3×2	2946.4	6.1096
Mean ± SD	—	—	—	2910±30.3	6.19±0.07

Configuration	N	NB	$P\times Q$	Time [s]	GFLOPS
4 tasks/node (24 ranks)	—	224	3×2	1243±22.5	14.48±0.26

Heterogeneous Configuration (6 RPi 4 + 2 RPi 3B):

Configuration	N	NB	$P\times Q$	Time [s]	GFLOPS	Comments
3×2 grid	2933	224	3×2	2932.8	6.1379	~0.4% ↑ vs. 6-nodes
8×1 grid	6873	224	8×1	6873.3	2.619	Poor stability

Scaling trend is approximately linear up to 6 nodes. Addition of RPi 3B nodes yields negligible performance improvement (~0.4%) and degrades system stability.

5. Power Consumption and Efficiency Metrics

Power is computed as $P(t)=V\cdot I(t)$ ( $V=5\,\mathrm{V}$ ), with total energy integrated over runtime ( $T_{\rm run}$ ):

$E = \int_{0}^{T_{\rm run}} P(t)\,dt \approx \sum_{i=1}^{n-1} \frac{P_i + P_{i+1}}{2}\Delta t$

Performance in GFLOPS is calculated as:

$\mathrm{GFLOPS} = \frac{2N^3}{3\,T_{\rm run} \times 10^{-9}}$

Efficiency ( $\eta$ ) is given by:

$\eta = \frac{\mathrm{GFLOPS}}{P_{\rm avg}} \quad \left[\mathrm{GFLOPS}/\mathrm{W}\right]$

Example from best 6-node run: $T_{\rm run}=35\,\mathrm{min}$ , $E=0.448\,\mathrm{Wh}$ , $P_{\rm avg}\approx0.768\,\mathrm{W}$ , $\mathrm{GFLOPS}=6.2667 \implies \eta\approx8.16\,\mathrm{GFLOPS}/\mathrm{W}$ . The paper reports an efficiency metric of $\eta\approx15.4\,\mathrm{GFLOPS}/\mathrm{W}$ , based on direct ratio of GFLOPS to Wh (Semken et al., 8 Dec 2025).

6. Scalability, Stability, and Bottleneck Analysis

Homogeneous configurations show nearly ideal scaling and minimal communication overhead up to 6 RPi 4 nodes. The inclusion of heterogeneous nodes (RPi 3B):

Induces NTP desynchronization, leading to MPI/Slurm job errors (“Zero Bytes…”).
Causes instability; RPi 3B nodes occasionally hang, requiring manual intervention (scontrol restart).
Results in <0.5% performance gain and compromised overall stability.

Principal bottlenecks are underpowered CPUs in RPi 3B, network contention from the single-switch topology, and scheduling complexity. Hybrid computations combining MPI and OpenMP (for instance, distributed PI-calculation workloads) yield improved efficiency when intra-node cores are fully engaged.

7. Design Recommendations and Future Prospects

Empirical evidence favors a homogeneous deployment of Raspberry Pi 4 nodes to enhance stability and streamline resource management. Intra-node parallelism (--ntasks-per-node=4) should be exploited for ≥2× performance gains. Optimal HPL execution occurs with process grid $P\times Q=3\times2$ and block size NB=224 on 6 nodes. Maintaining robust NTP synchronization and power reliability is essential.

Suggested improvements include:

Upgrading remaining nodes to RPi 4 with uniform memory provisioning (4 GB RAM)
Adding a secondary NIC or L2-capable switch to mitigate network jitter
Automating power logging with finer granularity
Conducting comparative studies with small-scale PC clusters for normalized cost, energy, and performance metrics

These recommendations reflect systematic findings on the performance and energy optimization of ARM-based educational clusters (Semken et al., 8 Dec 2025). Plausibly, broader adoption of such configurations may support pedagogical and applied research goals where scalability, economy, and efficiency are prioritized.

Markdown Upgrade to Chat

References (1)

Análisis de rendimiento y eficiencia energética en el cluster Raspberry Pi Cronos (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cronos Cluster.

Cronos Cluster: ARM-based HPC Platform

1. Architecture and Hardware Composition

2. Software Stack and Resource Management

3. Benchmarking Procedures

4. Performance Analysis

Homogeneous Configuration (6×RPi 4):

Heterogeneous Configuration (6 RPi 4 + 2 RPi 3B):

5. Power Consumption and Efficiency Metrics

6. Scalability, Stability, and Bottleneck Analysis

7. Design Recommendations and Future Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Cronos Cluster: ARM-based HPC Platform

1. Architecture and Hardware Composition

2. Software Stack and Resource Management

3. Benchmarking Procedures

4. Performance Analysis

Homogeneous Configuration (6×RPi 4):

Heterogeneous Configuration (6 RPi 4 + 2 RPi 3B):

5. Power Consumption and Efficiency Metrics

6. Scalability, Stability, and Bottleneck Analysis

7. Design Recommendations and Future Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research