Unclear causes of outlier catalog failures in the GNN-MNN estimator

Identify the underlying factors causing the graph neural network coupled to a moment neural network, trained on L-Galaxies galaxy catalogs, to perform poorly on a subset of test catalogs characterized by large residuals and high reduced chi-squared values, and characterize the data or modeling features responsible for this outlier behavior.

Background

The authors evaluate a GNN-MNN estimator of the matter density parameter Ωm trained on L-Galaxies and tested across several SAMs and hydrodynamical simulations. They use a reduced chi-squared threshold (χ² > 10) to filter predictions deemed unreliable and note that removing these outliers improves apparent performance.

Despite this filtering, the authors explicitly acknowledge that the reason for the poor performance on these catalogs is not currently understood and requires further investigation. This open question concerns diagnosing and mitigating failure modes in field-level ML inference for galaxy catalogs.

References

Even though this selection improves performance, the reason why the model performs poorly on these catalogs remains unclear and will require further investigation in future work.

Galaxy Phase-Space and Field-Level Cosmology: The Strength of Semi-Analytic Models  (2512.10222 - Santi et al., 11 Dec 2025) in Section 4.1 (Field-Level Inference and Performance Metrics)