FirecREST v2: Scalable HPC REST API

Updated 20 December 2025

FirecREST v2 is an open-source REST API for HPC that achieves a 100× performance boost through asynchronous, stateless operations.
It employs a layered architecture separating authentication, authorization, health checks, and forwarding to secure and streamline resource access.
Benchmarks confirm significantly reduced latency and enhanced scalability, with async I/O and SSH pooling driving high throughput under heavy loads.

FirecREST v2 is an open-source RESTful API for programmatic access to high-performance computing (HPC) resources, representing a comprehensive, ground-up redesign of its predecessor to deliver a 100× performance improvement. The new architecture emphasizes asynchronous, stateless operation, high throughput, and robust security. It incorporates lessons learned from bottlenecks identified in proxy-based APIs handling I/O-intensive HPC tasks and provides validated performance metrics and design practices for scalable, secure HPC access (Palme et al., 12 Dec 2025).

1. Layered Architecture and Component Overview

FirecREST v2 adopts a fully asynchronous, lightweight proxy model organized in four stateless, sequential layers. The request pipeline progresses as follows:

Client
Authentication Layer (OIDC/JWT offline validation)
Authorization Layer (JWT claims or external OpenFGA checks)
Health Checker (cached subsystem status)
Forwarding Layer (modular clients, SSH/HTTP, connection pooling)
HPC Subsystems (job scheduler, filesystem, object store)

This architectural flow is depicted below:

$\begin{tikzpicture}[node distance=1.5cm, every node/.style={draw,rectangle,align=center}] \node (C) {Client}; \node (A) [right=of C] {Authentication}; \node (Z) [right=of A] {Authorization}; \node (H) [right=of Z] {Health Checker}; \node (F) [right=of H] {Forwarding}; \node (S) [right=of F] {HPC Subsystems}; \draw[->] (C) -- (A) -- (Z) -- (H) -- (F) -- (S); \end{tikzpicture}$

Key innovations in v2 include an asynchronous HTTP server stack (Uvicorn + FastAPI + asyncio), offline JWT signature validation to eliminate network dependency for token verification, a pluggable authorization layer supporting claims and OpenFGA models, a Health Checker for subsystem liveness caching, and a modular Forwarding Layer with service-specific client abstractions and AsyncSSH-based user connection pools.

2. Performance Benchmarks and Optimization Strategies

Throughput ( $T$ ) and latency ( $L$ ) are formalized as

$T = \frac{N_{\mathrm{requests}}}{t_{\mathrm{total}}} \quad ; \quad L = t_{\mathrm{response}} - t_{\mathrm{request}}$

v1 exhibited critical bottlenecks due to Gunicorn’s multi-threaded model, online JWT introspection, per-request SSH connections, and real-time subsystem health checks, each creating synchronous I/O and thread pool saturation.

v2 replaces threads with asyncio-based concurrency, employs offline JWT validation, introduces SSH connection pooling (AsyncSSH), and implements asynchronous, cached health checks—altogether removing major sources of blocking and scaling limits.

Benchmarks involved (A) Postman-driven stress tests simulating core HPC operations on a Cray EX cluster and (B) AiiDA-integrated tests with up to 1000 concurrent 1 KB file downloads using Python httpx’s AsyncClient. Representative download results:

N	FirecREST v1	FirecREST v2	FirecREST v2 (SSH pool)
1	1.5 ± 0.1 s	0.8 ± 0.02 s	0.4 ± 0.3 s
10	13.7 ± 1.1 s	2.7 ± 0.05 s	0.5 ± 0.2 s
100	129.6 ± 1.5 s	19.5 ± 3.4 s	1.5 ± 0.7 s
1000	>1000 s	176.3 ± 4.5 s	15.5 ± 8.9 s

For large $N$ , SSH pooling enables up to two orders of magnitude improvement (∼100×), as real-world performance shifts from thread and I/O bound to event-driven, parallelizable workloads. Variability in results (standard deviation) is primarily cluster-resource dependent; allocating dedicated testing resources reduces confidence intervals.

3. Security and Horizontal Scalability Mechanisms

FirecREST v2 balances aggressive throughput goals with stringent access controls and systemic robustness:

Authentication employs OIDC-compliant JWTs, validated offline using pre-fetched public keys.
Authorization is dual-mode, using embedded JWT claims for direct role/cluster mapping or querying external OpenFGA stores for RBAC/ABAC enforcement.
Encryption ensures all HTTP exchanges use TLS, while SSH operations utilize AsyncSSH.
Scalability is driven by strict statelessness, allowing any pod instance to serve any request without session affinity, and Uvicorn-based server pods are horizontally deployable behind Kubernetes or hardware load balancers.
Health Checker logic forwards requests only to subsystems marked healthy, reducing overload propagation risk.
Asynchronous I/O ensures minimal CPU overhead, even at tens of thousands of parallel connections.

4. Design Principles and Implementation Lessons

Major technical conclusions from the v2 redesign include:

Async > Threads for I/O-bound proxies: In the context of Python, asyncio/Uvicorn outperforms Gunicorn/threading for proxy layers manipulating high-volume I/O, mainly due to the GIL and context-switch overhead of threads.
Offline token validation: Local signature checks remove per-request latency imposed by external introspection, saving hundreds of microseconds per call at scale.
Resource pooling: Maintaining per-user SSH pools bypasses hard concurrency limits imposed by backend daemons.
Health-cache: Asynchronous, proactive subsystem checks enable the proxy to make routing decisions instantaneously, flattening tail latencies and preventing wasted cycles.
Modularity: Encapsulating HPC backends (e.g., Slurm, S3) behind client abstractions assures maintainability and extension for heterogeneous environments.

For API designers targeting HPC, best practices are codified as: employ stateless async architectures, pool expensive or stateful connections, minimize remote handshakes, externalize trust and policies, and standardize on OIDC, JWT, ASGI, and REST protocols.

5. Validation Results and Ongoing Development

Extensive peer validation involving CUG reviewers and CSCS integration partners substantiated both the 100× speedups and operational stability under sustained load. Confirmed gains are attributed not only to event-driven I/O and pooling but also to the clean separation of concerns and trust minimization at each architectural boundary.

Current and prospective enhancements include supporting two-phase upload/download operations for large data transfers while pursuing statelessness, implementing resumable transfers and push notifications (WebSockets or server-sent events), expanding exported health metrics (including fine-grained latency and error histograms) with Prometheus endpoints, and evaluating gRPC or HTTP/2 for additional throughput increases.

6. Synthesis and Impact

FirecREST v2 demonstrably advances secure, scalable RESTful programmatic access to HPC, achieving up to 100× speedup over its predecessor through a strict asynchronous, modular, and stateless design (Palme et al., 12 Dec 2025). Its documented testing regimen, reproducible performance metrics, and modular middle-layer abstractions serve as a reference model for high-throughput, security-conscious resource proxies in large-scale computing facilities. Future development will continue to emphasize stateless, asynchronous operation, compatibility with emerging data-transfer protocols, and comprehensive observability.

Markdown Report Issue Upgrade to Chat

References (1)

FirecREST v2: lessons learned from redesigning an API for scalable HPC resource access (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FirecREST v2.