GALAX: Astronomy, Biomedical AI & Telemetry
- GALAX is a collection of integrated frameworks spanning astronomy, biomedical AI, and spacecraft telemetry, combining resources like the Siena Galaxy Atlas, REGALADE, and GalaxAI.
- The astronomical components provide multi-wavelength imaging and unified, highly complete galaxy catalogs that support transient event host identification and cosmic scaling relations.
- The biomedical AI framework employs graph-augmented language models with reinforcement learning to extract explainable subgraph rationales, advancing precision medicine applications.
GALAX refers to multiple advanced frameworks and resources in astronomy, data science, and biomedical AI. The most prominent denotations include (i) the Siena Galaxy Atlas (SGA), a comprehensive multi-wavelength optical and IR imaging atlas of nearby galaxies; (ii) REGALADE, a pipeline and nearly all-sky homogeneous galaxy catalog designed for time-domain and multi-messenger astrophysics; (iii) GALAX, a graph-augmented LLM for explainable subgraph reasoning in precision medicine; and (iv) GalaxAI, an interpretable machine-learning toolkit for spacecraft telemetry analysis. This article systematically details each, emphasizing their methodologies, scientific scope, and ecosystem relevance.
1. GALAX in Extragalactic Surveys: Siena Galaxy Atlas and REGALADE
1.1 Siena Galaxy Atlas (SGA-2020)
The Siena Galaxy Atlas 2020 (SGA-2020) is a uniform, multi-wavelength imaging survey comprising 383,620 nearby galaxies across approximately 19,721 deg² of extragalactic sky. It integrates deep optical imaging from the DESI Legacy Imaging Surveys with infrared coverage in four bands (3.4–22 μm) from unWISE coadds. The SGA is >95% complete for galaxies with isophotal semimajor axis and at the mag arcsec⁻² isophote. Key deliverables include:
- Precise coordinates, multi-wavelength mosaics, astrometric (Gaia DR2) and photometric (Pan-STARRS1) calibration.
- Azimuthally averaged surface brightness profiles: Measured via fixed-geometry elliptical isophotes, supporting parametric fits with the Sérsic law.
- Model images and photometry: From forward modeling using The Tractor, producing PSF, exponential (EXP), de Vaucouleurs (DEV), Sersic (SER), and REX models.
- Ancillary metadata: Including HyperLeda types, RC3 diameters, axis ratios, group assignments, and morphological classifications.
SGA-2020 supports applications in star formation history analyses, velocity field reconstruction (Tully–Fisher and Fundamental Plane relations), and as a reference database for electromagnetic counterpart identification in gravitational-wave and neutrino event localization (Moustakas et al., 2023).
1.2 REGALADE: A Unified All-Sky Galaxy Catalog
REGALADE is an end-to-end framework producing an all-sky, volume-limited galaxy catalog of 79,875,539 entries out to 2,000 Mpc. It merges curated galaxy catalogs (SGA, CosmicFlows, HECATE, GLADE(+), and others) and deep imaging surveys (Legacy Surveys, Pan-STARRS, SDSS, DELVE) using a ranked, “elliptical matching” criterion on sky position and size parameters. Critical components include:
- Distance estimation: Hierarchical via redshift-independent, spectroscopic, and photometric sources, utilizing a 1-point trimmed mean for robustness.
- Stellar mass estimation: Based on multi-band profile-fit photometry and mass–light–color relations with uncertainties ∼0.14 dex.
- Stellar and artifact removal: Systematic flagging and exclusion of stellar contaminants through Gaia cross-matching, compactness, and photometric criteria.
- Purity and completeness: >90% completeness for out to 360 Mpc, with stringent control over stellar contamination (6.7M Gaia-identified stars removed).
REGALADE delivers significant improvements in host identification for gravitational wave sources, transients, and X-ray sources compared to previous compilations such as GLADE, enabling robust multi-messenger follow-up (Tranin et al., 18 Aug 2025).
2. GALAX in Biomedical AI: Explainable Subgraph Reasoning Framework
GALAX (Graph Augmented LLM with eXplainability) is an AI framework designed for interpretable, reinforcement-guided subgraph reasoning in precision medicine. It addresses limitations of omics-only, text-centric, and graph-only methods by:
- Integrating quantitative multi-omic features, PPI/regulatory network topology, and literature-scale node text metadata.
- Employing a pretrained Graph Neural Network (GNN) as a Graph Process Reward Model (GPRM), enabling step-wise, process-level supervision without explicit intermediate reasoning annotations.
- Coupling with LLMs for policy generation and answer refinement, interleaved with reinforcement learning-based subgraph construction.
Mathematical description: Graph where each node has multi-omic features and text embeddings . The RL policy incrementally builds subgraphs, leveraging GPRM rewards of the form: where is the GNN's score for class , rollouts simulate future expansions, and penalizes violations.
Benchmarked on Target-QA (363 DepMap cell lines with multi-omic and CRISPR data), GALAX attains , outperforming statistical and prior LLM+GNN baselines. Its rationale is provided by RL-inferred subgraphs, which recapitulate disease pathways as validated by pathway enrichment (Zhang et al., 25 Sep 2025).
3. Data Products, Pipelines, and Methodologies
3.1 SGA and REGALADE Data Products and Pipelines
SGA-2020 data products include per-galaxy mosaics in optical and infrared, surface brightness (SB) profiles, model images, photometric fits, and group catalogs. Isophotal diameters are measured at multiple thresholds (22–26 mag arcsec⁻²). The Tractor forward-models photometry while accounting for the local PSF.
REGALADE pipeline stages:
| Stage | Core Methods | Key Output |
|---|---|---|
| Catalog ingestion | Catalog crossmatching, elliptical matches | Unified detections |
| Distance synthesis | Trimmed mean, prioritized redshift-independent values | Robust distances |
| Photometry & mass | Profile-fit photometry, color–mass relations | Stellar mass, |
| Stellar decontamination | Gaia crossmatch, proper motion & color cuts | Purified catalog |
| Completeness/purity check | Luminosity & mass function integration | Performance curves |
This systematic merging and cleaning enables statistical completeness out to cosmological distances for transient and GW event hosts (Tranin et al., 18 Aug 2025).
3.2 GALAX (Biomedical AI) Workflow
The GALAX biomedical pipeline consists of LLM pretraining on biomedical graphs, GNN graph foundation model pretraining (edge masking & cancer/noncancer classification), LLM-based initial candidate extraction, RL-driven subgraph expansion scored by the GPRM, and LLM answer refinement. RL policy learning incorporates greedy acceptance driven by positive reward delta (Zhang et al., 25 Sep 2025).
4. Applications and Performance
4.1 Astrophysical Surveys and Time-Domain Astronomy
SGA-2020 and REGALADE underpin key applications:
- Peculiar velocity measurements via Tully–Fisher (spirals: ) and Fundamental Plane (ellipticals: ) scaling relations.
- Transient and GW event host association: REGALADE assigns reliable hosts to 90.4% of TNS transients vs 56.7% in GLADE, and recovers twice as many BlackGEM hosts.
- X-ray source host cross-matching: REGALADE increases ULX and HLX associations compared to legacy catalogs.
- Foreground masking: For cosmological surveys requiring accurate galaxy/stellar separation (Moustakas et al., 2023, Tranin et al., 18 Aug 2025).
4.2 Precision Medicine
GALAX (biomedical) enables:
- Reliable, mechanistically-grounded target discovery: Outperforms strong LLM+omics baselines on Target-QA (Precision = 0.5472).
- Contextual, interpretable subgraph rationales: Final subgraphs highlight canonical oncogenes and pathways (e.g., EGFR signaling in LUAD).
- Dense process-level reward, avoiding reward hacking and sparse feedback in RL.
5. Related Toolkits: GalaxAI for Spacecraft Telemetry
GalaxAI is a modular, extensible machine-learning toolbox for spacecraft telemetry, supporting:
- Multivariate time series analyses, regression/classification, and structured output prediction.
- Feature-centric interpretability: Permutation importance, GENIE3, and symbolic regression metrics.
- Visualization: Dashboards (Plotly.js), scatter/doughnut/pie charts, ROC analysis.
Validated on Mars Express thermal power and INTEGRAL Van Allen belt-crossing prediction tasks, GalaxAI emphasizes robust, explainable modeling of high-throughput, heterogeneous telemetry for mission operations (Kostovska et al., 2021).
6. Data Access, Community Resources, and Legacy Value
SGA-2020 data products are accessible through the SGA Portal (https://sga.legacysurvey.org), NOIRLab Astro Data Lab, and the Legacy Surveys Viewer. REGALADE’s full dataset, visual classifications, and code are slated for community release.
GALAX (biomedical AI) is supported by Target-QA, combining multi-omic, CRISPR, and graph data for benchmarking.
GalaxAI is available as a Python toolkit with Electron/React GUI, supporting large-scale pipeline deployment with modular integration of new spacecraft or methods.
These resources collectively provide legacy datasets, interoperable tools, and methodologically rigorous frameworks for astronomical, biomedical, and engineering communities across scientific domains.