Novelty Detection on Radio Astronomy Data using Signatures (2402.14892v2)
Abstract: We introduce SigNova, a new semi-supervised framework for detecting anomalies in streamed data. While our initial examples focus on detecting radio-frequency interference (RFI) in digitized signals within the field of radio astronomy, it is important to note that SigNova's applicability extends to any type of streamed data. The framework comprises three primary components. Firstly, we use the signature transform to extract a canonical collection of summary statistics from observational sequences. This allows us to represent variable-length visibility samples as finite-dimensional feature vectors. Secondly, each feature vector is assigned a novelty score, calculated as the Mahalanobis distance to its nearest neighbor in an RFI-free training set. By thresholding these scores we identify observation ranges that deviate from the expected behavior of RFI-free visibility samples without relying on stringent distributional assumptions. Thirdly, we integrate this anomaly detector with Pysegments, a segmentation algorithm, to localize consecutive observations contaminated with RFI, if any. This approach provides a compelling alternative to classical windowing techniques commonly used for RFI detection. Importantly, the complexity of our algorithm depends on the RFI pattern rather than on the size of the observation window. We demonstrate how SigNova improves the detection of various types of RFI (e.g., broadband and narrowband) in time-frequency visibility data. We validate our framework on the Murchison Widefield Array (MWA) telescope and simulated data and the Hydrogen Epoch of Reionization Array (HERA).
- P. E. Dewdney, P. J. Hall, R. T. Schilizzi, and T. J. L. W. Lazio, “The square kilometre array,” Proceedings of the IEEE, vol. 97, no. 8, pp. 1482–1496, 2009.
- A. Offringa, J. Van De Gronde, and J. Roerdink, “A morphological algorithm for improving radio-frequency interference detection,” Astronomy & astrophysics, vol. 539, p. A95, 2012.
- A. R. Offringa, A. G. de Bruyn, M. Biehl, S. Zaroubi, G. Bernardi, and V. N. Pandey, “Post-correlation radio frequency interference classification methods,” Monthly Notices of the Royal Astronomical Society, Mar. 2010.
- M. J. Wilensky, M. F. Morales, B. J. Hazelton, N. Barry, R. Byrne, and S. Roy, “Absolving the ssins of precision interferometric radio data: a new technique for mitigating faint radio frequency interference,” Publications of the Astronomical Society of the Pacific, vol. 131, no. 1005, p. 114507, 2019.
- A. Offringa, B. Adebahr, A. Kutkin, E. Adams, T. Oosterloo, J. van der Hulst, H. Dénes, C. Bassa, D. Lucero, W. Blok et al., “An interference detection strategy for apertif based on aoflagger 3,” arXiv preprint arXiv:2301.01562, 2023.
- G. M. Nita and D. E. Gary, “The generalized spectral kurtosis estimator,” Monthly Notices of the Royal Astronomical Society: Letters, vol. 406, no. 1, pp. L60–L64, 2010.
- K. Muandet and B. Schölkopf, “One-class support measure machines for group anomaly detection,” arXiv preprint arXiv:1303.0309, 2013.
- L. Xiong, B. Póczos, J. Schneider, A. Connolly, and J. VanderPlas, “Hierarchical probabilistic models for group anomaly detection,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 789–797.
- S. J. Tingay, R. Goeke, J. D. Bowman, D. Emrich, S. M. Ord, D. A. Mitchell, M. F. Morales, T. Booler, B. Crosse, R. B. Wayth, and et al., “The murchison widefield array: The square kilometre array precursor at low radio frequencies,” Publications of the Astronomical Society of Australia, vol. 30, p. e007, 2013.
- The HERA Team, “Hydrogen epoch of reionization array (hera),” Publications of the Astronomical Society of the Pacific, vol. 129, no. 974, p. 045001, mar 2017.
- T. J. Lyons, “Differential equations driven by rough signals,” Revista Matemática Iberoamericana, vol. 14, no. 2, pp. 215–310, 1998.
- D. Levin, T. Lyons, and H. Ni, “Learning from the past, predicting the statistics for the future, learning an evolving system,” arXiv preprint arXiv:1309.0260, 2013.
- T. Lyons, “Rough paths, signatures and the modelling of functions on streams,” arXiv preprint arXiv:1405.4537, 2014.
- I. P. Arribas, G. M. Goodwin, J. R. Geddes, T. Lyons, and K. E. Saunders, “A signature-based machine learning model for distinguishing bipolar disorder and borderline personality disorder,” Translational psychiatry, vol. 8, no. 1, pp. 1–7, 2018.
- P. Moore, T. Lyons, J. Gallacher, A. D. N. Initiative et al., “Using path signatures to predict a diagnosis of alzheimer’s disease,” PloS one, vol. 14, no. 9, 2019.
- P. Kidger, P. Bonnier, I. Perez Arribas, C. Salvi, and T. Lyons, “Deep signature transforms,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- J. Morrill, A. Fermanian, P. Kidger, and T. Lyons, “A generalised signature method for multivariate time series feature extraction,” arXiv preprint arXiv:2006.00873, 2020.
- Z. Shao, R. S.-Y. Chan, T. Cochrane, P. Foster, and T. Lyons, “Dimensionless anomaly detection on multivariate streams with variance norm and path signature,” 2023.
- M. Lemercier, C. Salvi, T. Damoulas, E. Bonilla, and T. Lyons, “Distribution regression for sequential data,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 3754–3762.
- T. Lyons and A. D. McLeod, “Signature methods in machine learning,” arXiv preprint arXiv:2206.14674, 2022.
- A. Fermanian, T. Lyons, J. Morrill, and C. Salvi, “New directions in the applications of rough path theory,” IEEE BITS the Information Theory Magazine, 2023.
- T. L. et al, “Coropa computational rough paths (software library),” 2010. [Online]. Available: http://coropa.sourceforge.net/
- J. Reizenstein and B. Graham, “The iisignature library: efficient calculation of iterated-integral signatures and log signatures,” arXiv preprint arXiv:1802.08252, 2018.
- P. Kidger and T. Lyons, “Signatory: differentiable computations of the signature and logsignature transforms, on both cpu and gpu,” in International Conference on Learning Representations, 2020.
- Roughpy 0.1.0 - pypi. [Online]. Available: https://pypi.org/project/RoughPy/
- M. B. Karen Conneely, “So many correlated tests, so little time! rapid adjustment of p values for multiple correlated tests,” The American Journal of Human Genetics, vol. 81, no. 6, pp. 1158–1168, 2007.
- Y. Z. Schmid K, “The trouble with sliding windows and the selective pressure in brca1,” PLOS ONE 3(12), 2008.
- H. Ni, “The expected signature of a stochastic process,” Ph.D. dissertation, Oxford University, UK, 2012.
- I. Chevyrev and T. Lyons, “Characteristic functions of measures on geometric rough paths,” 2016.
- I. Chevyrev and H. Oberhauser, “Signature moments to characterize laws of stochastic processes,” arXiv preprint arXiv:1810.10971, 2018.
- E. Taskesen, “distfit is a python library for probability density fitting.” jan 2020. [Online]. Available: https://erdogant.github.io/distfit
- J. Pickands III, “Statistical inference using extreme order statistics,” the Annals of Statistics, pp. 119–131, 1975.
- S. J. Roberts, “Extreme value statistics for novelty detection in biomedical data processing,” IEE Proceedings-Science, Measurement and Technology, vol. 147, no. 6, pp. 363–367, 2000.
- A. Siffer, P.-A. Fouque, A. Termier, and C. Largouet, “Anomaly detection in streams with extreme value theory,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 1067–1075.
- E. Vignotto and S. Engelke, “Extreme value theory for anomaly detection–the gpd classifier,” Extremes, vol. 23, no. 4, pp. 501–520, 2020.
- L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” 2018, cite arxiv:1802.03426Comment: Reference implementation available at http://github.com/lmcinnes/umap. [Online]. Available: http://arxiv.org/abs/1802.03426
- W. Dong, C. Moses, and K. Li, “Efficient k-nearest neighbor graph construction for generic similarity measures,” in Proceedings of the 20th international conference on World wide web, 2011, pp. 577–586.
- J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,” IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019.
- CASA Team, “Casa, the common astronomy software applications for radio astronomy,” Publications of the Astronomical Society of the Pacific, vol. 134, no. 1041, p. 114501, nov 2022.
- “Simulating ngvla data-casa5.4.1,” https://casaguides.nrao.edu/index.php/Simulating_ngVLA_Data-CASA5.4.1, accessed: 2023-04-30.
- ASVO Collaboration, “The all-sky virtual observatory (asvo),” https://asvo.org.au/, accessed: 2023-02-27.
- G. J. McLachlan, “Mahalanobis distance,” Resonance, vol. 4, no. 6, pp. 20–26, 1999.
- HERA Team, “Basic simulation package for HERA-like redundant interferometric arrays,” 2023. [Online]. Available: https://github.com/HERA-Team/hera_sim
- P. M Keller, B. Nikolic, and HERA Team, “Search for the Epoch of Reionization with HERA: upper limits on the closure phase delay power spectrum,” Monthly Notices of the Royal Astronomical Society, vol. 524, no. 1, pp. 583–598, 02 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.