- The paper introduces DistClassiPy, a novel distance-based classifier that achieves up to 92% F1-score in classifying variable star light curves.
- It employs rigorous dimensionality reduction by selecting 31 key features from an initial 112, enhancing both performance and model interpretability.
- The classifier demonstrates robust performance across binary, multi-class, and one-vs-rest scenarios, offering improved computational efficiency over traditional models.
Light Curve Classification through DistClassiPy: A Newly Introduced Distance-Based Classifier
Introduction to Distance-Based Classification in Astronomy
With the advent of large-scale synoptic surveys, the field of astronomy has been inundated with vast amounts of data, ushering in an era where traditional manual classification methods are no longer viable. This necessitates the adoption of ML methodologies, particularly for the classification and identification of celestial objects based on their light curves—graphical representations of stellar brightness over time. While tree-based models such as Random Forests and deep learning models are currently prevalent, this paper introduces a novel approach through the development of DistClassiPy, a distance metric classifier aimed at light curve classification. This approach not only meets the state-of-the-art performance but also offers advantages in terms of computational efficiency and interpretability.
Constructing DistClassiPy
DistClassiPy leverages the concept of distance metrics, a fundamental notion within ML, for classifying variable stars. A total of 18 distinct distance metrics are employed, allowing for the comparison and classification of objects by evaluating the "distance" between feature vectors in multidimensional space. This distance-based methodology offers an intuitive framework for classifying objects, potentially increasing the interpretability of the results and lowering the computational demands.
Dataset and Feature Extraction
The paper utilizes light curves from the Zwicky Transient Facility (ZTF), specifically focusing on a catalog of 6,000 variable stars across 10 classes. The raw light curves are processed to extract 112 features per light curve, which are then subjected to rigorous dimensionality reduction techniques, ultimately retaining 31 features to ensure model efficiency and performance.
Classification and Dimensionality Reduction
The core of DistClassiPy's novelty lies in its method of classifying light curves through the application of different distance metrics. This paper explores three main classification scenarios: binary, multi-class, and one-vs-rest classifications, with particular emphasis on a multi-class classification involving four types of variable stars. It is found that reducing dimensionality by selecting the most relevant features for specific distance metrics further enhances classification performance, suggesting the importance of tailored feature selection in achieving optimal results.
Results and Implications
Across the board, DistClassiPy demonstrates competitive performance with an F1​ score of up to 92% in multi-class classification tasks, akin to that achieved by Random Forest classifiers. Furthermore, DistClassiPy outshines traditional methods in terms of computational efficiency and offers a level of flexibility and interpretability not readily available in other models. This is exemplified by the model's capacity to adjust the selection of distance metrics and features based on the dataset and computational resources at hand.
Future Directions
While DistClassiPy already presents a robust framework for light curve classification, further research could explore its applicability to transient classification and anomaly detection. Additionally, incorporating more distance metrics, including those used for comparing statistical distributions, could further expand its utility.
Conclusion
DistClassiPy introduces a promising new avenue for the classification of astronomical objects through its innovative use of distance metrics. By combining state-of-the-art performance with enhanced interpretability and computational efficiency, DistClassiPy sets a new standard for light curve classification, offering a valuable tool for astronomers in the age of big data.