- The paper introduces a machine learning-based anomaly detection pipeline that efficiently ranks transient alerts in the Fink broker.
- It utilizes a 26-dimensional feature space and an Isolation Forest algorithm to identify rare events such as AM CVn systems and supernova precursors.
- Real-time expert feedback and active learning enable continuous retraining, enhancing detection accuracy in high-volume astronomical surveys.
Automated Anomaly Detection in the Fink Broker: Architecture, Results, and Domain Integration
Introduction and Context
Modern time-domain surveys, notably the Zwicky Transient Facility (ZTF), generate nightly alert streams exceeding hundreds of thousands of candidates. The scientific yield of such surveys is dominated by rare or highly anomalous events, often buried in an immense background of common variable stars, supernovae, and instrumental artefacts. This study presents and analyzes the architecture, operation, and first-year results of the Fink broker's machine learning-based anomaly detection (AD) pipeline as described in "Anomaly detection in Fink. I. Discovery, follow-up, and classification of unusual sources" (2603.29511). The Fink AD system is designed not only for high-throughput anomaly scoring but also for effective connection to human expertise via real-time notification, expert feedback, and targeted follow-up, thereby bridging the technological gap between raw alert streams and astrophysical discovery.
Pipeline Architecture and Notification Strategy
The Fink pipeline ingests ZTF alerts, transforms light curves into a 26-dimensional feature space using the SNAD light-curve package, and applies an Isolation Forest (IF) algorithm for anomaly ranking. Notably, the pipeline operates independently on each photometric passband, later combining individual-band anomaly scores. The model undergoes frequent retraining to adapt to evolving survey characteristics and is updated using active learning driven by expert feedback.
Critically, the operational core includes an anomaly notification module (Fig. 1), which delivers the top-10 anomaly candidates nightly to a distributed set of experts via real-time channels such as Slack, Telegram, and a REST API, and presents cutouts, photometric summary, and context. This approach expedites the human-in-the-loop assessment cycle and fast-tracks scientifically promising objects for follow-up.
Figure 1: Schematic illustration of the anomaly notification process. Each night, the pipeline orders all alerts by their anomaly score. The Anomaly notification module selects the top-10 most anomalous objects and delivers them to the expert through Slack and Telegram messengers, as well as via the Fink API.
The immediate expert-facing interface is illustrated in Fig. 2, which provides integrated image, light curve, and meta-parameter displays in popular messaging platforms.
Figure 2: Example of a Telegram (left) and Slack (right) notification showing one of the top-10 anomaly candidates with science image cutout, light curve in difference magnitudes, and alert parameters.
Discovery Highlights: Astrophysical and Transient Classes
Rare AM CVn System (Fink J062452.88+020818.3)
Early in deployment, the pipeline identified a helium-dominated cataclysmic variable (AM CVn) showing a double superoutburst and superhumps, indicative of a WZ Sge-type object. Photometric and spectroscopic campaigns revealed a superhump period of Psh​=0.032(3) d (Fig. 3) and evolution of neutral helium emission lines (Fig. 4), with additional high-resolution line profile analysis (Fig. 5).
Figure 3: Composite light curve for Fink J062452.88+020818.3. The dashed lines denote moments when spectra were obtained. The inset plot shows the 31 January 2023 light curve segment with prominent superhumps, used to estimate the superhump period Psh​.
Figure 4: Evolution of nightly averaged spectra of Fink J062452.88+020818.3 from the 2.5-m CMO SAI MSU telescope. Most of the marked lines, except explicitly labeled, are neutral helium lines.
Figure 5: Profiles of some characteristic lines in the Fink J062452.88+020818.3 SALT spectrum acquired on 31 January 2023. The radial velocity is shown in the barycentric rest frame. The profiles are shifted vertically by 0.3 for convenience.
This represents the third reliably classified WZ Sge-type AM CVn and demonstrates the ability of an ML-driven pipeline to surface rare classes with characteristic light-curve morphologies for further study.
Supernova with Precursor (SN 2023mtp)
The detection of SN~2023mtp exemplifies the pipeline's ability to flag events with anomalous photometric structure, in this case a significant precursor emission approximately 2.5 months prior to the main SN outburst. The pipeline provided rapid notification, enabling a multi-instrument spectroscopic sequence and light-curve modeling with STELLA (Fig. 6) as well as spectral diagnostics (Fig. 7).
Figure 6: ZTF light curves of SN~2023mtp (circles denote good-quality measurements, triangles correspond to bad-quality measurements) together with the best-fit STELLA models from the grid of Moriya et al. (2023).
Figure 7: Spectrum of SN~2023mtp obtained with the 2.5-m CMO telescope on 1 September 2023.
The event exhibited conflicting spectroscopic signatures (Type IIb/IIP/IIn) with no unique template fit, and its precursor was both more luminous and temporally structured than observed in canonical SN~IIn precursors. Attempts to fit the complex photometric and spectroscopic evolution with state-of-the-art explosion models did not yield satisfactory results, emphasizing the value of anomaly-driven identification for physically distinct transients.
UX Ori-Type Star and Disk Variability
The system Fink~J222324.32+744222.0 was surfaced as an anomaly on the basis of its abrupt photometric increase and prolonged plateau (Fig. 8), later identified as a G2IV-type object with pronounced spectral evolution. SED analysis revealed an anticorrelation between optical and IR bands (Fig. 9), characteristic of variable circumstellar extinction and shadowing events in UX Ori variables.
Figure 8: Multicolor light curve of Fink~J222324.32+744222.0 based on ATLAS, WISE, and ZTF data. Black vertical dashed lines mark the epochs used for SED construction. The grey shaded band indicates the time interval during which the spectra were obtained with the 2.5-m CMO telescope.
Figure 9: Top: Spectral energy distributions for Fink~J222324.32+744222.0 at two epochs showing redistribution of energy between optical and infrared.
Time-resolved spectroscopy (Fig. 11) confirmed variable Hα and [S II], consistent with disk wind or flare activity in a dust-enshrouded, pre-main-sequence star.
Figure 10: Spectra of Fink~J222324.32+744222.0 obtained on different epochs with the 2.5-m CMO telescope. The black line shows the spectrum of the comparison star HD 120787.
Dwarf Novae and Flaring M Dwarfs
Nine new dwarf novae were flagged, with the flaring M-dwarf Fink~J042203.10+362318.7 (Fig. 12) exhibiting canonical short-duration flares. Color–color diagrams (Fig. 13) confirm locus and extinction behavior consistent with M-type and WZ Sge-like variables in Pan-STARRS2 color space.
Figure 11: ZTF DR23 light curves of flaring M-dwarf Fink~J042203.10+362318.7.
Figure 12: Color–color diagram showing dwarf novae and a flaring M-dwarf, demonstrating their distinct location and extinction-corrected positions.
Supernovae Population and Hostless Events
The module enabled the identification of 33 supernova candidates, including 30 previously unreported. Seven candidates feature absolute magnitudes exceeding −21m (SLSN threshold) and multiple events are hostless or detected in low-surface-brightness galaxies, supporting the utility of anomaly-based discovery for rare and extreme transients.
Limitations, Contaminants, and Human-in-the-Loop Learning
A detailed analysis of contaminants demonstrates that many pipeline anomalies are dominated by artefacts, moving objects, or flat supernovae where survey templates include the transient. Adaptive selection cuts and human review mitigate these, but expert input remains essential.
To this end, the Fink citizen science model incorporates expert reactions via Telegram bots and employs the Active Anomaly Discovery (AAD) algorithm [Das et al. 2017] for retraining (Fig. 17). The shift of eclipsing binaries to lower anomaly rank in AAD (Fig. 18) exemplifies the efficacy of leveraging community input to guide the AD landscape.

Figure 13: Distribution of alerts with expert feedback in the citizen science model by astronomical object type.
Figure 14: Anomaly rank for all alerts--eclipsing binaries--marked by experts as nominal, demonstrating improved ranking post AAD-based retraining.
Ambiguity in what constitutes an anomaly is shown to be sample- and expert-dependent, suggesting a transition to personalized AD models as the logical next step.
Implications and Future Directions
This work demonstrates that ML-based anomaly detection, when tightly integrated with real-time, expert-centered workflows, yields a discovery-centric pipeline that translates statistical outliers into verified astrophysical results. The pipeline's ability to recover rare evolutionary types and extreme transients, directly trigger follow-up, and correct public database mislabelings is particularly valuable as survey scale and heterogeneity increase.
The current architecture is compatible with the Real-Time Broker infrastructure for LSST; its modular, updateable nature aligns with the need for continual retraining with new data and evolving definitions of "anomaly." The work further highlights the role of active learning and citizen science in robustifying large-scale anomaly detection, and motivates the development of personalized anomaly surfaces.
Conclusion
The Fink AD pipeline, driven by feature-based IF ranking and real-time domain expert engagement, has demonstrated the conversion of raw machine-learned anomaly scores into novel scientific discovery. Results confirm both performance and robustness with diverse variable/transient populations, complex photometric morphologies, and extreme events. The system constitutes an effective bridge between high-volume alert streams and actionable astrophysical insight, and sets precedence for next-generation time-domain survey infrastructures (2603.29511).