Near Galaxy Injection in Transient Surveys
- Near Galaxy Injection is a method that injects synthetic transients near galaxies to reflect the true spatial distribution of extragalactic events.
- The approach enhances CNN-based real/bogus classifiers by simulating realistic galactic backgrounds, though it may increase false positives for variable stars.
- NGI, when combined with Random Injection, balances detection sensitivity and specificity, offering a practical solution for diverse transient surveys.
Near Galaxy Injection (NGI) refers to a point source injection strategy within astronomical survey pipelines, particularly in the context of developing and evaluating machine learning-based real/bogus (RB) classifiers. In NGI, simulated point sources are injected specifically in the vicinity of galaxies within optical images, reflecting an astrophysically motivated prior for the spatial distribution of extragalactic transients, such as supernovae, as opposed to more generic injection approaches. NGI is typically contrasted with Random Injection (RI), where point sources are placed without regard to the underlying astrophysical scene. The adoption of NGI enables more realistic assessment and enhancement of pipeline performance for transient detection tasks in contemporary surveys, such as those utilizing the Korea Microlensing Telescope Network (KMTNet) (Lee et al., 19 Oct 2025).
1. Motivation and Definition
In time-domain astronomy, RB classification is central to identifying genuine transient events amidst imaging artifacts. Training effective classifiers, especially convolutional neural networks (CNNs), is challenging due to the scarcity of labeled real events and a substantial class imbalance. Point source injection, in which simulated transients are digitally added to archival survey images, provides a mechanism to synthetically boost training sets. NGI specifically entails injecting these sources near the projected positions of detected galaxies, thereby approximating the physical distribution of cosmic explosions linked to host galaxies (e.g., supernovae) (Lee et al., 19 Oct 2025).
The main rationale for NGI is to better reflect the spatial clustering of astrophysical transients, which are typically found within the halos of galaxies, while avoiding the over-representation of isolated sources or foreground artifacts.
2. Methodological Framework
NGI is operationalized by selecting locations for synthetic point source injection that are spatially correlated with identified galaxies in the dataset. The implementation involves:
- Performing source extraction or utilizing existing catalogs (e.g., SExtractor outputs) to identify resolved galaxies in each survey image.
- Assigning injection coordinates within a specified distance (e.g., angular separation or isophotal radius) from these galaxies, either deterministically or probabilistically (e.g., following a 2D Gaussian centered on the galaxy).
- Ensuring that injected sources do not overly contaminate regions of excessive source confusion or regions near extremely bright stars to avoid non-representative artifacts.
- Generating realistic point spread functions (PSF) for injected sources, matching survey seeing conditions for physical verisimilitude.
A corresponding Random Injection (RI) process is identical except that source locations are assigned throughout the entire image frame independent of any underlying galaxy positions.
3. Effects on Real/Bogus Classification
Experimental comparisons on KMTNet datasets demonstrate that NGI impacts downstream classifier performance characteristics:
- RI produces synthetic samples representative of both galactic and extragalactic environments, leading to high sensitivity in asteroid detection and artifact rejection but poor recovery of transients near galaxies (Lee et al., 19 Oct 2025).
- NGI, by concentrating examples near galaxy light profiles, enables the RB network to better detect transients superimposed on complex galactic backgrounds, which is essential for supernova discovery.
- However, NGI-optimized classifiers exhibit increased rates of false positives, especially for variable stars that may appear near galaxies, resulting in diminished specificity for stellar transients.
The paper presents simulation-to-reality evaluations on real, imbalanced datasets from gravitational wave follow-up campaigns (GW190814, S230518h), confirming that point source injection strategy strongly influences the trade-off between detection rate and false positive rate.
4. Trade-offs and Combined Approaches
No single injection strategy suffices across all astrophysical use cases:
| Injection Strategy | Strengths | Weaknesses |
|---|---|---|
| RI | Asteroid detection, artifact filtering | Poor for galaxy-associated transients |
| NGI | Transients near galaxies | High false positive rate (variable stars) |
| Combined | Balance between detection and false positives | Intermediate complexity |
The combined approach, wherein both RI and NGI are used to generate composite training sets, more effectively balances sensitivity and specificity, optimizing classifier utility in heterogeneous survey environments (Lee et al., 19 Oct 2025).
5. Implications for Survey Design and Follow-up
Point source injection strategies, and NGI specifically, are critical for robust RB classifier development, affecting both survey completeness and follow-up resource allocation:
- Surveys reliant on transient detection in crowded, extragalactic fields (e.g., cosmology-oriented projects, multi-messenger campaigns) benefit from prioritizing NGI to recover physically relevant transient populations.
- For surveys with pronounced foreground contamination or asteroid pollution—such as wide-field, high-cadence survey modes—RI remains indispensable for maintaining artifact rejection efficacy.
- The adoption of combined injection regimes is operationally favored for pipelines seeking general-purpose performance across diverse event classes.
A plausible implication is that the continual refinement of injection site selection (e.g., incorporating host galaxy properties, photometric redshifts, or environmental metrics) could further tailor classifier responses for next-generation transient surveys.
6. Limitations and Future Directions
Current experimental results are primarily evaluated on KMTNet and gravitational wave optical counterpart searches; generalizability to other facilities or deeper imaging campaigns remains to be fully characterized (Lee et al., 19 Oct 2025). Additionally, high NGI false positive rates for variables highlight the persistent challenge of disentangling extragalactic transient signatures from foreground stellar variability when relying solely on proximity-based injection rules.
Future research may incorporate more sophisticated, physically-aware simulation paradigms, including multi-wavelength and time-series injection strategies, and joint optimization with domain adaptation techniques to reduce the simulation-to-reality gap. Hyperparameter optimization for injection radius, host probability weighting, and context-aware masking are anticipated areas for performance gains as survey volumes expand and classifier architectures become more complex.
7. Role in the Evolution of Astronomical Transient Discovery
The use of NGI and related point source injection strategies reflects a transition toward data-centric, simulation-informed methodology in time-domain astronomy. As machine learning RB classifiers become foundational survey infrastructure, the explicit consideration of astrophysical priors in synthetic training data generation such as NGI is expected to further align survey yields with scientific priorities in the detection and classification of cosmic transients.