AVEAS: Visual Analytics on EOSC

Updated 15 December 2025

AVEAS is an integrated research infrastructure project that provides visual analytics workflows for astrophysical big data using cloud-native technologies.
It leverages the European Open Science Cloud to deliver reproducible analyses spanning data ingestion, imaging, machine learning-driven detection, and publication.
The platform utilizes containerized microservices and FAIR data practices to enable scalable, secure, and interoperable astrophysical data processing and visualization.

AVEAS (Astrophysics Visual Analytics on EOSC) is an integrated research infrastructure project designed to deliver scalable, cloud-native, and FAIR-compliant visual analytics workflows for astrophysical big data. Positioned at the confluence of large-scale data management, machine learning–driven discovery, and Open Science practices, AVEAS leverages the European Open Science Cloud (EOSC) to enable astronomers to conduct end-to-end analyses on distributed survey data, transforming raw observations into science-ready mosaics, catalogs, and publication-quality results (Sciacca et al., 2020).

1. Objectives, Scope, and Target Users

AVEAS explicitly aims to position visual analytics at the core of the astrophysical data lifecycle, strictly adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles. Its central motivation is to provide astronomers with reproducible, cloud-hosted workflows that remove institutional and geographic barriers, supporting all phases from observation planning, data ingestion, imaging, mosaicing, to knowledge extraction and data publishing. Key user groups include:

Survey teams: assembling large multi-pointing 2D/3D galactic-plane maps
Extragalactic astronomers: detecting extended, diffuse emission and structures
Data scientists: mining terabyte-scale image cubes for structure cataloguing using machine learning
Science gateway users: orchestrating complex, resource-intensive pipelines via web or REST APIs

By leveraging EOSC’s shared compute, storage, and authentication infrastructure, AVEAS generalizes astrophysics analyses to a scalable, federated framework.

2. System Architecture and EOSC Integration

AVEAS implements a microservice-oriented architecture, encapsulating each core analytic or management function—ingestion, mosaicing, visualization, ML-based structure detection—as Docker containers. These are orchestrated via Kubernetes or OpenStack Magnum in the EGI Federated Cloud.

Science Gateway: A web portal built on the Liferay/JSF stack, extended with CAGE (Cloud for Astrophysics GatEways) REST API, exposes pipeline composition, data registration, and workflow launch.
Storage: EOSC federated volumes and object stores support block/object-level I/O for tiles, intermediates, and products.
Compute: On-demand virtual appliances (e.g., VisIVO SD VA), batch virtual clusters, and GPU-backed nodes for ML.
Identity Management: EGI Check-In federates eduGAIN and social providers, issuing tokens for storage, compute, and gateway access.

This abstraction masks the complexity of cloud federation, enabling seamless pipeline deployment and data movement.

3. Data Lifecycle and FAIR Compliance

AVEAS operationalizes the data lifecycle as a sequence of ingest, curation, processing, and preservation, each augmented with FAIR stewardship:

Ingestion: Users register FITS/cube datasets and populate discipline-standardized metadata to a Knowledge Base (triple-store RDF, SPARQL-accessible).
Curation/Indexing: Metadata (ObsCore, VO-Resource) are standardized and indexed (e.g., via Apache Solr, Elasticsearch), enabling fast search and sub-selection.
Processing: Compute containers execute user-directed tasks, accessing data from EOSC storage, writing back artifacts, and tracking lineage.
Preservation/Publication: Results are deposited to FAIR Data Points, assigned PIDs/DOIs, and annotated for provenance.

FAIRness is quantitatively assessed: $FID = \frac{|I \cap D|}{|I|}, \quad ACC = \frac{|A_{open}|}{|A_{total}|}, \quad INT = \frac{|S_{std}\cap M|}{|M|}, \quad REU = \frac{|R\cap R_{req}|}{|R_{req}|}$ where, for example, $|I|$ is the number of indexed metadata fields, and $|D|$ is the set required by the community standard.

4. Imaging and Multidimensional Map Construction

AVEAS supports advanced wide-field imaging pipelines, adapting established algorithms for scalable, distributed execution:

Montage-style reprojection and coaddition for 2D images and 3D spectral cubes, employing linear coordinate transformations:

$x' = a\,x + b\,y + c, \quad y' = d\,x + e\,y + f$

Unimap destriping and background matching synchronize adjacent tiles, correcting baseline discontinuities.
Noise- and PSF-weighted mosaicing:

$w_i(x,y) = \frac{1}{\sigma_i^2(x,y)}\ \Bigg/\ \sum_j\frac{1}{\sigma_j^2(x,y)}$

where $\sigma_i$ is the local per-pixel noise (typically from robust RMS estimation).

These approaches ensure optimal S/N, flux conservation, and tunable handling of overlapping/heterogeneous datasets.

5. Machine Learning for Structure Detection

Data-driven structure extraction is central to AVEAS. Workflows comprise:

Semantic segmentation via CNNs: Robust pixel-level identification of diffuse filament/bubble-like morphologies, with supervised binary cross-entropy loss:

$\mathcal{L}_{\rm CE} = -\frac{1}{N}\sum_{i=1}^N \left[y_i\log p_i + (1-y_i)\log(1-p_i)\right]$

Clustering and feature extraction: DBSCAN, k-means, and learned descriptor spaces for separating compact source candidates.
Hybrid pipelines: Classical tools (e.g., CAESAR, CuTEx) for initial source candidate detection, followed by ML-based classification for purity/recall optimization.

Performance is rigorously tracked via: $\mathrm{Precision} = \frac{TP}{TP + FP},\quad \mathrm{Recall} = \frac{TP}{TP + FN},\quad F_1 = 2\,\frac{\mathrm{Precision}\times \mathrm{Recall}}{\mathrm{Precision} + \mathrm{Recall}}$ Exploiting the EOSC GPU infrastructure allows training CNNs for terabyte-class datasets, reducing epoch times from hours to tens of minutes on multi-GPU nodes.

6. Implementation Technologies and Scientific Workflows

AVEAS’s technical stack consists of:

Containerization: Docker images, private registries, Kubernetes orchestration.
Science Gateway: Liferay portal, JSF, with CAGE REST API for programmatic interaction (GitHub: github.com/acaland/simple-cloud-gateway).
Astrophysical standards: SAMP, TAP, ObsCore/VO-DF metadata formats.
Processing libraries: Montage, Unimap, Astropy, SciPy, TensorFlow/Keras, OpenCV.
Knowledge Base: RDF triple-store, SPARQL endpoint supporting data lineage and provenance queries.

The user workflow includes federated authentication, dataset registration, pipeline configuration/launch, progress monitoring, visualization through VisIVO, and publication of enriched datasets.

Workflow Steps

Stage	Action	Tools/APIs
Authentication	Federated login via EGI Check-In	EGI
Data Registration	Upload FITS/cube, annotate metadata	Science Gateway GUI
Pipeline Config	Compose/workflow via GUI or CAGE REST; select tasks	Liferay/CAGE
Execution	Distributed container orchestration	Kubernetes
Visualization	View mosaics/catalogs in VisIVO desktop/web	VisIVO
Publication	Export with metadata, publish as FAIR dataset (PID/DOI)	FAIR Data Point

7. Performance, Scalability, and Future Directions

Benchmarks demonstrate:

Mosaicing throughput scales as $T(N) \approx T_1/N + T_\mathrm{overhead}$ with $T_\mathrm{overhead} \sim O(\log N)$ .
ML training: On 4-GPU nodes, epoch times reduced by a factor of $\approx 3.4$ compared to single GPU.
Storage and indexing: 400 MB/s per block storage volume, 10,000 metadata records/sec in Elasticsearch.

Planned enhancements involve migration to fully Kubernetes-native Science Gateway deployment, integration with JupyterLab for interactive analytics and D3.js/ Vega-Lite visualization, automated ML-driven anomaly detection, and expanded FAIR automation (e.g., extracting and publishing metadata directly from FITS headers). Adoption of columnar formats (e.g., Apache Parquet) is proposed to further reduce I/O bottlenecks for large catalogs.

A plausible implication is that these future directions will further close the gap between data acquisition and public, reproducible astrophysical inference, reinforcing the Open Science and community-driven research paradigms (Sciacca et al., 2020).

Markdown Upgrade to Chat

References (1)

Toward porting Astrophysics Visual Analytics Services to the European Open Science Cloud (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AVEAS Project.

AVEAS: Visual Analytics on EOSC

1. Objectives, Scope, and Target Users

2. System Architecture and EOSC Integration

3. Data Lifecycle and FAIR Compliance

4. Imaging and Multidimensional Map Construction

5. Machine Learning for Structure Detection

6. Implementation Technologies and Scientific Workflows

Workflow Steps

7. Performance, Scalability, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

AVEAS: Visual Analytics on EOSC

1. Objectives, Scope, and Target Users

2. System Architecture and EOSC Integration

3. Data Lifecycle and FAIR Compliance

4. Imaging and Multidimensional Map Construction

5. Machine Learning for Structure Detection

6. Implementation Technologies and Scientific Workflows

Workflow Steps

7. Performance, Scalability, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research