CPROPS-based GMC Catalogs

Updated 4 February 2026

CPROPS-based catalogs are systematically constructed inventories of giant molecular clouds derived from CO spectral-line data using refined segmentation techniques.
They integrate advanced homogenization, noise matching, and dendrogram deblending to ensure unbiased, reproducible measurement of cloud properties.
Rigorous completeness tests with mock cloud injections validate these catalogs, revealing scale-dependent GMC properties and environmental variations.

A CPROPS-based catalog is a systematically constructed inventory of giant molecular clouds (GMCs) identified in spectral-line data cubes (typically CO observations) using advanced extensions of the original CPROPS (Cloud PROPerties) algorithm. The method has been extensively refined and implemented in the PYCPROPS codebase, notably for producing the PHANGS-ALMA GMC catalogs at a uniform 90 pc resolution. These catalogs standardize cloud identification and measurement, rigorously characterize sample completeness, and deliver per-cloud physical parameters and metadata in reproducible formats (Rosolowsky et al., 2021).

1. Data Homogenization and Preparation

CPROPS-based catalogs require the homogenization of heterogeneous interferometric data cubes. For PHANGS-ALMA, input CO (2–1) data from multiple arrays (ALMA 12m, 7m, and Total Power) typically cover beams from ∼26–90 pc and noise levels from 36 to 208 mK per 2.5 km s⁻¹ channel. Each cube is convolved to a common, round Gaussian beam representing 90 pc at the galaxy's distance, with the convolution kernel computed in Fourier space after native beam deconvolution.

Noise matching involves estimating the local RMS noise $\sigma_0(x,y,v)$ and injecting filtered Gaussian noise to produce a final, uniform sensitivity per channel $\sigma_T = 75$ mK. The procedure sets:

$N^* (x,y,v) = N(x,y,v)\,\sqrt{\,\sigma_T^2 - \sigma_0^2(x,y,v)\,}$

This ensures all processed data cubes share identical physical and sensitivity properties, avoiding bias in subsequent cloud segmentation and measurement.

2. Cloud Segmentation and Deblending

Segmentation proceeds via multi-threshold masking and dendrogram analysis:

Signal mask construction begins by defining high-significance (≥4 σ_T in ≥2 adjacent channels) and low-significance (≥2 σ_T in ≥2 adjacent channels) regions. A final mask $\mathcal{M}(x,y,v)$ incorporates all low-significance regions contiguous with a high-significance pixel, excluding isolated low peaks.
Local maxima are detected using dendrograms (via the "astrodendro" library). Each leaf corresponds to a local intensity peak $T_\text{max}$ , with a "merge level" $T_\text{merge}$ where peaks would combine.
Peak selection uses:
1. Contrast: $T_\text{max} - T_\text{merge} > \delta = 2 \sigma_T \approx 0.15$ K.
2. Minimum unique volume: requires at least 0.25 beam-equivalent voxels above $T_\text{merge}$ .
3. Uniqueness: all significant peaks are preserved; prior merging logic based on SIGDISCONT is omitted.
4. Minimum separation: none imposed ( $d_\text{min} = v_\text{min} = 0$ ), allowing retention of beam-sized structures.

Deblending assigns all masked voxels to the surviving seeds using a compactness-weighted seeded watershed algorithm (compactness parameter = 1000) implemented in scikit-image, yielding compact boundaries in position–position–velocity space. Each voxel in $\mathcal{M}(x,y,v)$ is uniquely labeled to one cloud, forming the cloud mask $\mathcal{L}(x,y,v)$ .

3. Completeness Characterization

To rigorously assess completeness, >1000 mock Gaussian clouds are injected into emission-free regions of each data cube, sampling mass ( $M$ ), surface density ( $\Sigma_\text{mol}$ ), and virial parameter ( $\alpha_\text{vir}$ ) from log-uniform priors. The associated radius and line width are derived:

$R = \sqrt{\,M/(2 \pi \Sigma_\text{mol})\,}$

$\sigma_v = \left[ \left(\alpha_\text{vir} G/5\right)^{1/2} \left( \pi M \Sigma_\text{mol}/2 \right)^{1/4} \right]$

The entire segmentation and measurement pipeline is re-run, and recovery is tracked as a function of these injected properties. The detection probability is fitted as a logistic function:

$P(M, \Sigma, \alpha) = \left[1 + \exp(-c_0 - c_1 \log_{10}(M/10^6) - c_2 \log_{10}(\Sigma/150) - c_3 \log_{10}(\alpha/2)) \right]^{-1}$

At fixed $\Sigma=150\,M_\odot\,\text{pc}^{-2}$ and $\alpha=2$ , the 50% completeness mass is:

$M_\text{comp} = 10^{-c_0/c_1+6}\,M_\odot \approx 4.7\times10^5\,M_\odot$

4. Cloud Property Measurement and Catalog Content

Individual cloud properties are extracted from the voxelized mask. Each cloud's pixels are used to calculate:

CO luminosity (zeroth moment):

$L_\text{CO} = A_\text{pix} \sum_{i\in\mathcal{C}} T_i \Delta v$

Intensity-weighted mean velocity: $\overline{v} = (\sum T_i v_i)/(\sum T_i)$
Velocity dispersion: deconvolved for channel width,

$\sigma_v = \sqrt{ \sigma_{v,\text{obs}}^2 - \sigma_\text{chan}^2 }$

Spatial second moments: diagonalized to produce major/minor axes. Beam deconvolution yields the deconvolved sizes, and the 2D effective radius is $R = \eta\sqrt{\sigma_{\text{maj},d} \sigma_{\text{min},d}}$ with $\eta = \sqrt{2\ln2}\approx1.18$ . Three-dimensional correction is applied for finite disk thickness.
Extrapolation to zero threshold: Moments are measured above sliding intensity thresholds, then extrapolated (linear for $R$ , $\sigma_v$ ; quadratic for $L_\text{CO}$ ) to account for sensitivity bias.
Derived physical properties:
- $M_\text{CO} = \alpha_\text{CO} L_\text{CO}$ , with $\alpha_\text{CO}$ prescriptions dependent on metallicity ( $\alpha_\text{CO}^{2-1} = \alpha_\text{CO}^{1-0}(Z)/R_{21}$ , $R_{21}=0.65$ , $\alpha_\text{CO}^{1-0}(Z) \propto Z^{-1.6}$ , normalized at solar $4.35\,M_\odot (\text{pc}^2 K\,\text{km}\,\text{s}^{-1})^{-1}$ )
- Virial mass: $M_\text{vir} = 5\sigma_v^2 R_{3D}/G$
- Virial parameter: $\alpha_\text{vir} = M_\text{vir}/M_\text{CO}$
- Surface density: $\Sigma_\text{mol} = M_\text{CO}/(2\pi R^2)$
- Internal pressure: $P_\text{int} = 3M_\text{CO}\sigma_v^2 / (8\pi R_{3D}^3)$
- Free-fall time: $t_\text{ff} = \sqrt{\pi^2 R_{3D}^3/(4GM_\text{CO})}$

Catalogs list per-cloud quantities including galaxy name, cloud ID, RA/Dec, $V_\text{LSR}$ , size, orientation, line width, luminosity, mass, surface density, pressure, and free-fall time.

5. Catalog Delivery, Format, and Metadata

Catalogs are released for each galaxy as FITS binary tables and CSV files at 90 pc resolution. File names follow a standardized format (e.g., "PHANGS_ALMA_<NGCxxxx>_90pc_GMC_catalog.fits"). Headers encode the beam size, channel width, noise, and details of the adopted $\alpha_\text{CO}$ prescription. Columns, units, and derivations are clearly specified. A master index table provides per-galaxy statistics: number of clouds, completeness limit ( $M_\text{comp}$ ), resolution, and noise.

File Format	Key Features	Typical Contents
FITS, CSV	90 pc resolution;	Cloud IDs, RA/Dec, physical parameters
	standardized naming	Galaxy metadata, detection limits
	header metadata

6. Initial Scientific Results and Astrophysical Insights

The ten-galaxy PHANGS-ALMA 90 pc catalog contains 5758 clouds (4986 with peak S/N ≥ 6), with per-galaxy counts ranging from ∼48 to ∼1432. Key observed relations include:

Size–linewidth scaling: median $\sigma_v \propto R_{3D}^{0.5}$ , normalization ≈ 0.7 km s⁻¹ at 1 pc.
Mass–radius: nearly uniform $\Sigma_\text{mol} \sim 10^2\,M_\odot\,\text{pc}^{-2}$ , with ≈0.3 dex scatter.
Virial balance: median $\alpha_\text{vir} ≈ 1.5$ (±0.3 dex), $M_\text{vir} \simeq M_\text{CO}$ .

Mass distributions are modeled with both power-law and truncated Schechter forms: $p(M) \propto M^\beta e^{-M/M_c}$ , completeness-corrected. Typical $\beta \approx -2.2$ , $M_c$ of a few × $10^6\,M_\odot$ , with truncation statistically preferred in ∼4/10 galaxies.

Systematic environmental variations are evident:

Bar vs. disc: bar clouds have higher $\alpha_\text{vir}$ (2.6 vs. 1.4), higher $\Sigma_\text{mol}$ (130 vs. 96 $M_\odot$ pc⁻²), and higher $M_c$ (1.25× $10^7$ vs. 4.7× $10^6\,M_\odot$ ).
Arm vs. interarm: arm clouds are slightly denser (87 vs. 75 $M_\odot$ pc⁻²), with lower $\alpha_\text{vir}$ (1.2 vs. 1.4) and higher $M_c$ (5.8 vs. 2.3× $10^6\,M_\odot$ ).
Galaxy centers show elevated $\sigma_v$ and $\alpha_\text{vir}$ , which may reflect diffuse gas, streaming, or lower conversion factors.

These results suggest GMC populations are approximately virialized, with surface densities near $10^2\,M_\odot\,\text{pc}^{-2}$ . Environmental effects produce systematic but moderate variations in cloud mass spectra, surface density, and dynamical state (Rosolowsky et al., 2021).

7. Software Implementation and Reproducibility

All improved algorithms, including noise and resolution homogenization, advanced segmentation and deblending (including dendrograms and seeded watershed methods), completeness modeling, and measurement/extrapolation corrections, are available in the public PYCPROPS python package (github.com/phangsteam/pycprops). This enables standardized, reproducible catalog construction for GMC populations in diverse galactic environments and facilitates direct comparative cloud studies.

CPROPS-based catalogs, as demonstrated by PHANGS-ALMA, represent state-of-the-art practice for GMC segmentation, measurement, and comparative analysis in extragalactic molecular gas science (Rosolowsky et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Giant Molecular Cloud Catalogues for PHANGS-ALMA: Methods and Initial Results (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CPROPS-based Catalogs.