AVEAS: Visual Analytics on EOSC
- AVEAS is an integrated research infrastructure project that provides visual analytics workflows for astrophysical big data using cloud-native technologies.
- It leverages the European Open Science Cloud to deliver reproducible analyses spanning data ingestion, imaging, machine learning-driven detection, and publication.
- The platform utilizes containerized microservices and FAIR data practices to enable scalable, secure, and interoperable astrophysical data processing and visualization.
AVEAS (Astrophysics Visual Analytics on EOSC) is an integrated research infrastructure project designed to deliver scalable, cloud-native, and FAIR-compliant visual analytics workflows for astrophysical big data. Positioned at the confluence of large-scale data management, machine learning–driven discovery, and Open Science practices, AVEAS leverages the European Open Science Cloud (EOSC) to enable astronomers to conduct end-to-end analyses on distributed survey data, transforming raw observations into science-ready mosaics, catalogs, and publication-quality results (Sciacca et al., 2020).
1. Objectives, Scope, and Target Users
AVEAS explicitly aims to position visual analytics at the core of the astrophysical data lifecycle, strictly adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles. Its central motivation is to provide astronomers with reproducible, cloud-hosted workflows that remove institutional and geographic barriers, supporting all phases from observation planning, data ingestion, imaging, mosaicing, to knowledge extraction and data publishing. Key user groups include:
- Survey teams: assembling large multi-pointing 2D/3D galactic-plane maps
- Extragalactic astronomers: detecting extended, diffuse emission and structures
- Data scientists: mining terabyte-scale image cubes for structure cataloguing using machine learning
- Science gateway users: orchestrating complex, resource-intensive pipelines via web or REST APIs
By leveraging EOSC’s shared compute, storage, and authentication infrastructure, AVEAS generalizes astrophysics analyses to a scalable, federated framework.
2. System Architecture and EOSC Integration
AVEAS implements a microservice-oriented architecture, encapsulating each core analytic or management function—ingestion, mosaicing, visualization, ML-based structure detection—as Docker containers. These are orchestrated via Kubernetes or OpenStack Magnum in the EGI Federated Cloud.
- Science Gateway: A web portal built on the Liferay/JSF stack, extended with CAGE (Cloud for Astrophysics GatEways) REST API, exposes pipeline composition, data registration, and workflow launch.
- Storage: EOSC federated volumes and object stores support block/object-level I/O for tiles, intermediates, and products.
- Compute: On-demand virtual appliances (e.g., VisIVO SD VA), batch virtual clusters, and GPU-backed nodes for ML.
- Identity Management: EGI Check-In federates eduGAIN and social providers, issuing tokens for storage, compute, and gateway access.
This abstraction masks the complexity of cloud federation, enabling seamless pipeline deployment and data movement.
3. Data Lifecycle and FAIR Compliance
AVEAS operationalizes the data lifecycle as a sequence of ingest, curation, processing, and preservation, each augmented with FAIR stewardship:
- Ingestion: Users register FITS/cube datasets and populate discipline-standardized metadata to a Knowledge Base (triple-store RDF, SPARQL-accessible).
- Curation/Indexing: Metadata (ObsCore, VO-Resource) are standardized and indexed (e.g., via Apache Solr, Elasticsearch), enabling fast search and sub-selection.
- Processing: Compute containers execute user-directed tasks, accessing data from EOSC storage, writing back artifacts, and tracking lineage.
- Preservation/Publication: Results are deposited to FAIR Data Points, assigned PIDs/DOIs, and annotated for provenance.
FAIRness is quantitatively assessed: where, for example, is the number of indexed metadata fields, and is the set required by the community standard.
4. Imaging and Multidimensional Map Construction
AVEAS supports advanced wide-field imaging pipelines, adapting established algorithms for scalable, distributed execution:
- Montage-style reprojection and coaddition for 2D images and 3D spectral cubes, employing linear coordinate transformations:
- Unimap destriping and background matching synchronize adjacent tiles, correcting baseline discontinuities.
- Noise- and PSF-weighted mosaicing:
where is the local per-pixel noise (typically from robust RMS estimation).
These approaches ensure optimal S/N, flux conservation, and tunable handling of overlapping/heterogeneous datasets.
5. Machine Learning for Structure Detection
Data-driven structure extraction is central to AVEAS. Workflows comprise:
- Semantic segmentation via CNNs: Robust pixel-level identification of diffuse filament/bubble-like morphologies, with supervised binary cross-entropy loss:
- Clustering and feature extraction: DBSCAN, k-means, and learned descriptor spaces for separating compact source candidates.
- Hybrid pipelines: Classical tools (e.g., CAESAR, CuTEx) for initial source candidate detection, followed by ML-based classification for purity/recall optimization.
Performance is rigorously tracked via: Exploiting the EOSC GPU infrastructure allows training CNNs for terabyte-class datasets, reducing epoch times from hours to tens of minutes on multi-GPU nodes.
6. Implementation Technologies and Scientific Workflows
AVEAS’s technical stack consists of:
- Containerization: Docker images, private registries, Kubernetes orchestration.
- Science Gateway: Liferay portal, JSF, with CAGE REST API for programmatic interaction (GitHub: github.com/acaland/simple-cloud-gateway).
- Astrophysical standards: SAMP, TAP, ObsCore/VO-DF metadata formats.
- Processing libraries: Montage, Unimap, Astropy, SciPy, TensorFlow/Keras, OpenCV.
- Knowledge Base: RDF triple-store, SPARQL endpoint supporting data lineage and provenance queries.
The user workflow includes federated authentication, dataset registration, pipeline configuration/launch, progress monitoring, visualization through VisIVO, and publication of enriched datasets.
Workflow Steps
| Stage | Action | Tools/APIs |
|---|---|---|
| Authentication | Federated login via EGI Check-In | EGI |
| Data Registration | Upload FITS/cube, annotate metadata | Science Gateway GUI |
| Pipeline Config | Compose/workflow via GUI or CAGE REST; select tasks | Liferay/CAGE |
| Execution | Distributed container orchestration | Kubernetes |
| Visualization | View mosaics/catalogs in VisIVO desktop/web | VisIVO |
| Publication | Export with metadata, publish as FAIR dataset (PID/DOI) | FAIR Data Point |
7. Performance, Scalability, and Future Directions
Benchmarks demonstrate:
- Mosaicing throughput scales as with .
- ML training: On 4-GPU nodes, epoch times reduced by a factor of compared to single GPU.
- Storage and indexing: 400 MB/s per block storage volume, 10,000 metadata records/sec in Elasticsearch.
Planned enhancements involve migration to fully Kubernetes-native Science Gateway deployment, integration with JupyterLab for interactive analytics and D3.js/ Vega-Lite visualization, automated ML-driven anomaly detection, and expanded FAIR automation (e.g., extracting and publishing metadata directly from FITS headers). Adoption of columnar formats (e.g., Apache Parquet) is proposed to further reduce I/O bottlenecks for large catalogs.
A plausible implication is that these future directions will further close the gap between data acquisition and public, reproducible astrophysical inference, reinforcing the Open Science and community-driven research paradigms (Sciacca et al., 2020).