pop-cosmos: Scaleable inference of galaxy properties and redshifts with a data-driven population model (2406.19437v2)
Abstract: We present an efficient Bayesian method for estimating individual photometric redshifts and galaxy properties under a pre-trained population model (pop-cosmos) that was calibrated using purely photometric data. This model specifies a prior distribution over 16 stellar population synthesis (SPS) parameters using a score-based diffusion model, and includes a data model with detailed treatment of nebular emission. We use a GPU-accelerated affine invariant ensemble sampler to achieve fast posterior sampling under this model for 292,300 individual galaxies in the COSMOS2020 catalog, leveraging a neural network emulator (Speculator) to speed up the SPS calculations. We apply both the pop-cosmos population model and a baseline prior inspired by Prospector-$\alpha$, and compare these results to published COSMOS2020 redshift estimates from the widely-used EAZY and LePhare codes. For the $\sim 12,000$ galaxies with spectroscopic redshifts, we find that pop-cosmos yields redshift estimates that have minimal bias ($\sim10{-4}$), high accuracy ($\sigma_\text{MAD}=7\times10{-3}$), and a low outlier rate ($1.6\%$). We show that the pop-cosmos population model generalizes well to galaxies fainter than its $r<25$ mag training set. The sample we have analyzed is $\gtrsim3\times$ larger than has previously been possible via posterior sampling with a full SPS model, with average throughput of 15 GPU-sec per galaxy under the pop-cosmos prior, and 0.6 GPU-sec per galaxy under the Prospector prior. This paves the way for principled modeling of the huge catalogs expected from upcoming Stage IV galaxy surveys.