Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anomaly detection in Fink. I. Discovery, follow-up, and classification of unusual sources

Published 31 Mar 2026 in astro-ph.HE and astro-ph.IM | (2603.29511v1)

Abstract: Modern wide-field time-domain surveys produce alert streams whose scientific potential is often concentrated in rare and unusual events. Efficient discovery therefore requires automated pipelines to be combined with rapid expert validation and follow-up. We present the first-year performance of the anomaly-detection (AD) pipeline operating within the Fink broker on the Zwicky Transient Facility alert stream, and assess its ability to identify scientifically valid outliers and enable discovery of rare phenomena. The pipeline transforms ZTF light curves into a compact set of features and ranks alerts using an Isolation Forest model trained on archival ZTF data. Each night, the 10 most anomalous candidates are distributed to experts via Slack/Telegram and exposed through an API. We also implement an expert-feedback loop using a public Telegram bot and retrain the model using the Active Anomaly Discovery algorithm. During the first year of operations (starting from 25 January 2023), the AD pipeline identified multiple high-interest sources and triggered dedicated photometric and spectroscopic follow-up. We report the discovery and multi-instrument (11-m SALT telescope, 2.5-m CMO telescope, 0.6-m ASA RC600, 0.25-m FRAM-ORM) follow-up of the rare AM CVn system Fink J062452.88+020818.3 of the WZ Sge type, UX Ori-type star Fink J222324.32+744222.0 and the unusual transient with precursor SN 2023mtp. In addition, the module triggered 33 supernovae, including 30 previously unreported ones, with candidates for superluminous and hostless events. Furthermore, nine new dwarf novae were discovered. These results show that broker-level anomaly detection, coupled with rapid dissemination, expert assessment, and follow-up observations, provide an effective bridge between large-scale survey streams and domain expertise, turning anomaly scores into astrophysical insights and concrete discoveries.

Summary

  • The paper introduces a machine learning-based anomaly detection pipeline that efficiently ranks transient alerts in the Fink broker.
  • It utilizes a 26-dimensional feature space and an Isolation Forest algorithm to identify rare events such as AM CVn systems and supernova precursors.
  • Real-time expert feedback and active learning enable continuous retraining, enhancing detection accuracy in high-volume astronomical surveys.

Automated Anomaly Detection in the Fink Broker: Architecture, Results, and Domain Integration

Introduction and Context

Modern time-domain surveys, notably the Zwicky Transient Facility (ZTF), generate nightly alert streams exceeding hundreds of thousands of candidates. The scientific yield of such surveys is dominated by rare or highly anomalous events, often buried in an immense background of common variable stars, supernovae, and instrumental artefacts. This study presents and analyzes the architecture, operation, and first-year results of the Fink broker's machine learning-based anomaly detection (AD) pipeline as described in "Anomaly detection in Fink. I. Discovery, follow-up, and classification of unusual sources" (2603.29511). The Fink AD system is designed not only for high-throughput anomaly scoring but also for effective connection to human expertise via real-time notification, expert feedback, and targeted follow-up, thereby bridging the technological gap between raw alert streams and astrophysical discovery.

Pipeline Architecture and Notification Strategy

The Fink pipeline ingests ZTF alerts, transforms light curves into a 26-dimensional feature space using the SNAD light-curve package, and applies an Isolation Forest (IF) algorithm for anomaly ranking. Notably, the pipeline operates independently on each photometric passband, later combining individual-band anomaly scores. The model undergoes frequent retraining to adapt to evolving survey characteristics and is updated using active learning driven by expert feedback.

Critically, the operational core includes an anomaly notification module (Fig. 1), which delivers the top-10 anomaly candidates nightly to a distributed set of experts via real-time channels such as Slack, Telegram, and a REST API, and presents cutouts, photometric summary, and context. This approach expedites the human-in-the-loop assessment cycle and fast-tracks scientifically promising objects for follow-up. Figure 1

Figure 1: Schematic illustration of the anomaly notification process. Each night, the pipeline orders all alerts by their anomaly score. The Anomaly notification module selects the top-10 most anomalous objects and delivers them to the expert through Slack and Telegram messengers, as well as via the Fink API.

The immediate expert-facing interface is illustrated in Fig. 2, which provides integrated image, light curve, and meta-parameter displays in popular messaging platforms. Figure 2

Figure 2: Example of a Telegram (left) and Slack (right) notification showing one of the top-10 anomaly candidates with science image cutout, light curve in difference magnitudes, and alert parameters.

Discovery Highlights: Astrophysical and Transient Classes

Rare AM CVn System (Fink J062452.88+020818.3)

Early in deployment, the pipeline identified a helium-dominated cataclysmic variable (AM CVn) showing a double superoutburst and superhumps, indicative of a WZ Sge-type object. Photometric and spectroscopic campaigns revealed a superhump period of Psh=0.032(3)P_{\mathrm{sh}} = 0.032(3) d (Fig. 3) and evolution of neutral helium emission lines (Fig. 4), with additional high-resolution line profile analysis (Fig. 5). Figure 3

Figure 3: Composite light curve for Fink J062452.88+020818.3. The dashed lines denote moments when spectra were obtained. The inset plot shows the 31 January 2023 light curve segment with prominent superhumps, used to estimate the superhump period PshP_{\mathrm{sh}}.

Figure 4

Figure 4: Evolution of nightly averaged spectra of Fink J062452.88+020818.3 from the 2.5-m CMO SAI MSU telescope. Most of the marked lines, except explicitly labeled, are neutral helium lines.

Figure 5

Figure 5: Profiles of some characteristic lines in the Fink J062452.88+020818.3 SALT spectrum acquired on 31 January 2023. The radial velocity is shown in the barycentric rest frame. The profiles are shifted vertically by 0.3 for convenience.

This represents the third reliably classified WZ Sge-type AM CVn and demonstrates the ability of an ML-driven pipeline to surface rare classes with characteristic light-curve morphologies for further study.

Supernova with Precursor (SN 2023mtp)

The detection of SN~2023mtp exemplifies the pipeline's ability to flag events with anomalous photometric structure, in this case a significant precursor emission approximately 2.5 months prior to the main SN outburst. The pipeline provided rapid notification, enabling a multi-instrument spectroscopic sequence and light-curve modeling with STELLA (Fig. 6) as well as spectral diagnostics (Fig. 7). Figure 6

Figure 6: ZTF light curves of SN~2023mtp (circles denote good-quality measurements, triangles correspond to bad-quality measurements) together with the best-fit STELLA models from the grid of Moriya et al. (2023).

Figure 7

Figure 7: Spectrum of SN~2023mtp obtained with the 2.5-m CMO telescope on 1 September 2023.

The event exhibited conflicting spectroscopic signatures (Type IIb/IIP/IIn) with no unique template fit, and its precursor was both more luminous and temporally structured than observed in canonical SN~IIn precursors. Attempts to fit the complex photometric and spectroscopic evolution with state-of-the-art explosion models did not yield satisfactory results, emphasizing the value of anomaly-driven identification for physically distinct transients.

UX Ori-Type Star and Disk Variability

The system Fink~J222324.32+744222.0 was surfaced as an anomaly on the basis of its abrupt photometric increase and prolonged plateau (Fig. 8), later identified as a G2IV-type object with pronounced spectral evolution. SED analysis revealed an anticorrelation between optical and IR bands (Fig. 9), characteristic of variable circumstellar extinction and shadowing events in UX Ori variables. Figure 8

Figure 8: Multicolor light curve of Fink~J222324.32+744222.0 based on ATLAS, WISE, and ZTF data. Black vertical dashed lines mark the epochs used for SED construction. The grey shaded band indicates the time interval during which the spectra were obtained with the 2.5-m CMO telescope.

Figure 9

Figure 9: Top: Spectral energy distributions for Fink~J222324.32+744222.0 at two epochs showing redistribution of energy between optical and infrared.

Time-resolved spectroscopy (Fig. 11) confirmed variable Hα\alpha and [S II], consistent with disk wind or flare activity in a dust-enshrouded, pre-main-sequence star. Figure 10

Figure 10: Spectra of Fink~J222324.32+744222.0 obtained on different epochs with the 2.5-m CMO telescope. The black line shows the spectrum of the comparison star HD 120787.

Dwarf Novae and Flaring M Dwarfs

Nine new dwarf novae were flagged, with the flaring M-dwarf Fink~J042203.10+362318.7 (Fig. 12) exhibiting canonical short-duration flares. Color–color diagrams (Fig. 13) confirm locus and extinction behavior consistent with M-type and WZ Sge-like variables in Pan-STARRS2 color space. Figure 11

Figure 11: ZTF DR23 light curves of flaring M-dwarf Fink~J042203.10+362318.7.

Figure 12

Figure 12: Color–color diagram showing dwarf novae and a flaring M-dwarf, demonstrating their distinct location and extinction-corrected positions.

Supernovae Population and Hostless Events

The module enabled the identification of 33 supernova candidates, including 30 previously unreported. Seven candidates feature absolute magnitudes exceeding −21m-21^m (SLSN threshold) and multiple events are hostless or detected in low-surface-brightness galaxies, supporting the utility of anomaly-based discovery for rare and extreme transients.

Limitations, Contaminants, and Human-in-the-Loop Learning

A detailed analysis of contaminants demonstrates that many pipeline anomalies are dominated by artefacts, moving objects, or flat supernovae where survey templates include the transient. Adaptive selection cuts and human review mitigate these, but expert input remains essential.

To this end, the Fink citizen science model incorporates expert reactions via Telegram bots and employs the Active Anomaly Discovery (AAD) algorithm [Das et al. 2017] for retraining (Fig. 17). The shift of eclipsing binaries to lower anomaly rank in AAD (Fig. 18) exemplifies the efficacy of leveraging community input to guide the AD landscape. Figure 13

Figure 13

Figure 13: Distribution of alerts with expert feedback in the citizen science model by astronomical object type.

Figure 14

Figure 14

Figure 14: Anomaly rank for all alerts--eclipsing binaries--marked by experts as nominal, demonstrating improved ranking post AAD-based retraining.

Ambiguity in what constitutes an anomaly is shown to be sample- and expert-dependent, suggesting a transition to personalized AD models as the logical next step.

Implications and Future Directions

This work demonstrates that ML-based anomaly detection, when tightly integrated with real-time, expert-centered workflows, yields a discovery-centric pipeline that translates statistical outliers into verified astrophysical results. The pipeline's ability to recover rare evolutionary types and extreme transients, directly trigger follow-up, and correct public database mislabelings is particularly valuable as survey scale and heterogeneity increase.

The current architecture is compatible with the Real-Time Broker infrastructure for LSST; its modular, updateable nature aligns with the need for continual retraining with new data and evolving definitions of "anomaly." The work further highlights the role of active learning and citizen science in robustifying large-scale anomaly detection, and motivates the development of personalized anomaly surfaces.

Conclusion

The Fink AD pipeline, driven by feature-based IF ranking and real-time domain expert engagement, has demonstrated the conversion of raw machine-learned anomaly scores into novel scientific discovery. Results confirm both performance and robustness with diverse variable/transient populations, complex photometric morphologies, and extreme events. The system constitutes an effective bridge between high-volume alert streams and actionable astrophysical insight, and sets precedence for next-generation time-domain survey infrastructures (2603.29511).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.