Papers
Topics
Authors
Recent
Search
2000 character limit reached

EGI FedCloud: Federated Research Cloud Platform

Updated 23 January 2026
  • EGI FedCloud Platform is a federated cloud infrastructure that integrates diverse IaaS and PaaS solutions under a unified access and authentication framework.
  • It offers cross-site orchestration and SLA-backed resource provisioning through standardized APIs, supporting elastic compute and storage operations.
  • The platform enables interoperable, FAIR-compliant data management with integrated solutions like Onedata and VOSpace, empowering science gateways and research workflows.

The EGI FedCloud Platform is a pan-European federated infrastructure aggregating heterogeneous cloud resources—including OpenStack and OpenNebula IaaS sites—under a consolidated access and authentication framework tailored for research data and compute workloads. Developed through collaborative efforts such as the European Grid Infrastructure, INDIGO-DataCloud, and ESCAPE projects, EGI FedCloud enables elastic, SLA-backed resource provisioning, cross-site orchestration, and federated identity management supporting diverse science communities. Its architecture harmonizes IaaS, PaaS, and microservices-based science gateways, exposing common interfaces for workload portability and FAIR-compliant data management across distributed, multi-institutional sites (Taffoni et al., 2023, D'Agostino et al., 2019, Collaboration et al., 2017, Bertocco et al., 2017, Bertocco et al., 2018).

1. Federation Architecture and Core Principles

EGI FedCloud operates as an aggregation layer spanning both public and private IaaS providers, each running OpenStack, OpenNebula, or other compatible cloud stacks. Site-local cloud fabrics (compute nodes, block/object storage, network fabrics) expose standardized APIs—principally OpenStack’s Keystone, Nova, Glance, Neutron, and provider-specific OCCI (Open Cloud Computing Interface)—to a central platform management layer (Taffoni et al., 2023, Collaboration et al., 2017).

The federation is coordinated via:

  • Authentication & Authorization Infrastructure (AAI): Enforces user identity and group/VO membership via OpenID Connect (e.g., InDiGo-IAM, EGI Check-in), X.509 certificates, or VOMS proxies.
  • Resource Orchestration: Brokering workloads across sites using dynamic discovery (Keystone catalog, OCCI endpoints), quotas, and site metadata (flavors, images, network mappings).
  • Service Exposure: Common REST/CLI endpoints enable the instantiation and lifecycle management of virtual machines (VMs), block devices, and containers.
  • Data Federation: Onedata and VOSpace interfaces unify storage access, ensuring consistent POSIX, HTTP, and IVOA-compliant data abstraction (Bertocco et al., 2018, Bertocco et al., 2017).

A typical component and data flow (per (Taffoni et al., 2023)):

1
2
3
4
5
6
7
8
9
10
11
\begin{tikzpicture}[node distance=1.2cm,auto,>=latex']
\node[draw,rectangle] (User) {ESAP User};
\node[draw,rectangle,right=of User] (ESAPFE) {ESAP Front-end/API};
\node[draw,rectangle,below=of ESAPFE] (Auth) {InDiGo-IAM (OIDC)};
\node[draw,rectangle,right=of ESAPFE] (Prov) {ESAP Provisioning Service};
\node[draw,rectangle,right=of Prov] (FedClient) {fedcloudclient};
\node[draw,rectangle,below=of FedClient] (EGI_IAM) {EGI Check-in (OIDC)};
\node[draw,rectangle,right=of FedClient] (OSAPI) {OpenStack APIs};
\node[draw,rectangle,right=of OSAPI] (EGI_Cloud) {EGI FedCloud Site};
% Detailed edge list omitted for brevity
\end{tikzpicture}

2. Authentication, Authorization, and Identity Federation

The AAI subsystem permits seamless cross-site access while supporting legacy and modern protocols:

  • OpenID Connect (OIDC): Used in ESCAPE/ESAP and by INDIGO IAM for portal-to-cloud trust, federated via eduGAIN, Shibboleth, and campus IdPs. JWTs are the principal credential format exchanged via OIDC code flow, with group and role claims managed by IAM/Check-in brokers (Taffoni et al., 2023, Collaboration et al., 2017).
  • VOMS/X.509 PKIX: For grid-oriented workloads, each user holds an IGTF-certified X.509 certificate, optionally augmented with VOMS attributes (VO, group, role). The OpenStack Keystone-VOMS extension enables VO-based authorization and token issuance, mapping identities into cloud tenants (Bertocco et al., 2018, Bertocco et al., 2017).
  • Token Translation (WaTTS, motley-cue): For services requiring other credential types (e.g., SSH keys, Kerberos), OIDC or VOMS tokens are programmatically mapped to appropriate local credentials (Taffoni et al., 2023, Collaboration et al., 2017).

Policy enforcement relies on group/VO membership (usually group-based for resource ACLs in, e.g., VOSpace), with persistence handled via LDAP and relational DBs. Fine-grained access is possible at the data node (file, container) or infrastructure level (Bertocco et al., 2017). The incorporation of token-based and certificate-based credentialing supports both modern portal-centric science gateways and more traditional batch/grid workflows.

3. Resource Provisioning, Orchestration, and Automation

VM, container, and storage lifecycle operations are mediated by site-local and federated clients:

  • fedcloudclient: Python-based wrapper around standard OpenStack CLI, orchestrating sequence of API invocations—image/flavor discovery, server instantiation, floating IP association—backed by OIDC or VOMS tokens (Taffoni et al., 2023).
  • Orchestrator/Infrastructure Manager (IM): TOSCA-template driven federated deployment of complex, multi-site workflows, optionally leveraging Cloud Provider Rankers for SLA-aware brokering and Mesos/Marathon for containerized tasks (Collaboration et al., 2017).
  • OCCI and rOCCI-cli: Front endpoints for VM/block storage via OCCI-compliant orchestration handlers (D'Agostino et al., 2019).

Provisioning performance is summarized as follows: | Operation | Typical Latency (CESGA) | |---------------------|---------------------------------| | Keystone token+catalog | 0.5–1 s | | Image/flavor listing | 0.2–0.5 s per API call | | VM creation (boot/network/config) | 60–90 s |

A generalized provisioning latency equation: TprovTauth+NlistTlist+Tcreate+TconfigT_{\mathrm{prov}} \approx T_{\mathrm{auth}} + N_{\mathrm{list}} T_{\mathrm{list}} + T_{\mathrm{create}} + T_{\mathrm{config}} where TconfigT_{\text{config}} is cloud-init/post-installation time (Taffoni et al., 2023).

4. Data Management and Interoperability Mechanisms

EGI FedCloud enables distributed, FAIR-compliant storage and data movement:

  • VOSpace 2.1: Exposes IVOA-standard node-based storage model (nodes, containers, data nodes, link nodes) over RESTful APIs. Metadata abstraction, fine-grained ACLs, and object/byte-level transfer are supported via HTTP(S), with DataNodes mapped to POSIX or Swift/CDMI backends (Bertocco et al., 2018, Bertocco et al., 2017).
  • Onedata: Provides global, POSIX-like federated filesystem abstraction over disparate sites; supports caching, QoS profiles, and multi-GB/s throughput for scientific workflows (Collaboration et al., 2017).
  • CernVM-FS (CVMFS): For software distribution within VMs, CVMFS mounts large, versioned runtime environments (e.g., SAS/HEAsoft) as read-only POSIX filesystems, minimizing image size and accelerating contextualization (D'Agostino et al., 2019).
  • Cross-site transfers: Credential delegation (IVOA CDP protocol) enables remote service calls under a single federated identity, facilitating access to foreign VOSpace resources and inter-cloud data migration.

Typical storage throughput reaches RIO20R_{\mathrm{IO}} \approx 20–$50$ MB/s per VM (scratch disks), with pilot-scale cross-atlantic FITS data transfers reported at \sim80 MB/s (D'Agostino et al., 2019, Bertocco et al., 2017).

5. Limitations, Challenges, and Proven Solutions

Key challenges and adopted workarounds include:

  • Network Naming and Discovery: Site-specific public network names (“public”, “ext-net”, etc.) complicate automation. Solutions involve site metadata collectors running periodic network enumeration (e.g. openstack network list) and maintaining a catalog of mappings (Taffoni et al., 2023).
  • VO Support and Site Capability Discovery: Lack of an API to enumerate which sites accept a given VO is mitigated by parsing and caching site lists from fedcloudclient and tagging VO support (Taffoni et al., 2023).
  • Flavor/Image Metadata Ambiguity: Absence of flavor/image naming conventions is addressed by programmatic extraction and caching of CPU, RAM, and disk properties using OpenStack show calls, publishing this metadata for workflow configuration (Taffoni et al., 2023).
  • Authentication Friction: Inconsistent AAI systems (InDiGo IAM vs. EGI Check-in) and per-site SSH credentialing are unified through federated OIDC trust brokers and services such as motley-cue for user mapping (Taffoni et al., 2023).
  • Certificate and Credential Management: Heavy reliance on X.509 and VOMS creates a usability barrier; proposed remediation involves adoption of OAuth2/OIDC-based identity federation and increased use of token translation (Bertocco et al., 2018, Bertocco et al., 2017).

Operational best practices include deploying microservice-based integration layers (as in PortalTS and EXTraS), round-robin site selection for load distribution, strict monitoring, and embracing auto-scaling triggers based on job queue depth and site utilization (D'Agostino et al., 2019).

6. Science Gateways, Use Cases, and Performance Observations

Science gateways such as ESAP (ESCAPE), EXTraS, and CANFAR-OATs-INAF leverage the EGI FedCloud Platform to deliver domain-specific analysis as-a-service on distributed data:

  • ESAP (ESCAPE): API gateway and portal for brokering EOSC-linked datasets, leveraging Django/Python microservices and fedcloudclient for VM/container instantiation (Taffoni et al., 2023).
  • EXTraS: Microservice-driven portal (PortalTS) for astrophysical transient analysis, using OCCI CLI for multi-site job distribution, CVMFS for on-demand software layers, and per-job contextualized VMs via cloud-init (D'Agostino et al., 2019).
  • CANFAR/OATs-INAF: Federated deployment for astronomical data analysis, combining IVOA-compliant services, OpenStack VMs with VOSpace clients, and transcontinental data access for co-located storage and compute (Bertocco et al., 2017, Bertocco et al., 2018).

Performance metrics (where reported) demonstrate high throughput in terms of aggregate CPU time (e.g., 1579515\,795 h), efficient SLA-backed vCPU and storage quotas, and efficient job scheduling paradigms (e.g., round-robin, fair-share, preemptible instances) (D'Agostino et al., 2019, Collaboration et al., 2017).

7. Roadmap, Recommendations, and Future Directions

Adopted and proposed enhancements:

  • Standardize site metadata and naming conventions (e.g., “public-net” network naming, flavors.json catalogs).
  • Federate InDiGo-IAM and EGI Check-in at broker level; map group/role claims via JWT for seamless AAI.
  • Develop VO → site support discovery endpoints in fedcloudclient for programmatic capability matching.
  • Deploy site metadata collector microservices for continuous inventory of site resources, capabilities, and VO support.
  • Instrument workflow for detailed provisioning/operation metrics, supporting future SLA guarantees; establish end-to-end monitoring with dashboards (e.g., Grafana/Prometheus).
  • Integrate container and notebook platforms (JupyterHub, Kubernetes, Docker) for interactive workloads.
  • Pursue transition to token-based credentialing (OAuth2/OIDC) for user-friendly, scalable identity and access control (Taffoni et al., 2023, Bertocco et al., 2017, Collaboration et al., 2017).

These directions are fundamental for advancing EGI FedCloud as a robust, federated PaaS for high-throughput, data-intensive scientific research environments, facilitating transparent, policy-driven and programmable compute/data workflows across an expanding set of European and international sites.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EGI FedCloud Platform.