MKCLASS: Automated Stellar Classification Tool

Updated 10 November 2025

MKCLASS is an automated expert system that implements the Morgan–Keenan stellar spectral classification, emulating human decision protocols with reproducible accuracy.
It uses a two-stage methodology combining weighted chi-squared template matching and spectral line-index analysis to deliver detailed rough and refined classifications.
The Python-based NutMaat reimplementation enables high-throughput batch processing and integration into survey pipelines while matching human expert performance.

The MKCLASS automated classification tool is a rule-based expert system for stellar spectral classification on the Morgan–Keenan (MK) system. Designed originally in C, MKCLASS emulates the logic and decision protocols of a human classifier, offering automated, scalable classification for large astronomical spectral surveys. Its influence extends to contemporary Python implementations, most notably NutMaat, which preserves the inferential logic of MKCLASS while leveraging Python's scientific computing ecosystem for efficient, platform-independent analysis and pipeline integration. The persistent scientific relevance of MKCLASS and its architectural descendants arises from the need for robust, reproducible MK classification amid increasing data volumes in modern spectroscopic surveys.

1. Architectural Principles and System Design

MKCLASS is structured around an expert system paradigm, with NutMaat as a Pythonic reimplementation preserving the decision logic of the original. NutMaat adopts an object-oriented design with separation into three core components: data ingestion, preprocessing, and classification layers. Data intake accommodates spectra in ASCII or FITS formats, or as arrays within pandas DataFrames, supporting standard libraries such as libr18 and libnor36, and allowing user-supplied libraries for flexibility.

Preprocessing comprises radial-velocity correction via cross-correlation with template spectra to obtain $v_{\rm rad}$ and rest-frame alignment, continuum normalization via low-order polynomial fits over predefined regions, and optional convolution with Gaussian LSFs for resolution matching. The modular preprocessing pipeline emphasizes automation and reproducibility for large datasets.

Classification proceeds in two stages: rough typing—using either weighted $\chi^2$ template matching or spectral-index-based estimation—and detailed typing, subdivided by spectral class (O, B, A, F–G, K–M), each invoking temperature and luminosity subroutines. Peculiarity detection is integrated inline, flagging anomalies in measured indices contemporaneously to the classification process.

Dependencies within NutMaat include core Python scientific libraries: pandas (I/O, tabular management), numpy (numerics), scipy (optimization, interpolation), astropy (I/O, units), and optional matplotlib (visualization).

2. Core Algorithms and Decision Logic

Classification within MKCLASS and NutMaat is grounded in three principal methods:

Template Matching: The spectrum $F_{\rm obs}(\lambda)$ of an input star is compared to a library standard $F_{\rm std}(\lambda)$ via a weighted sum of squared residuals:

$\chi^2 = \sum_{i} w_i [F_{\rm obs}(\lambda_i) - F_{\rm std}(\lambda_i)]^2\,,$

where weights $w_i$ down-weight noise- or skyline-contaminated regions.

Spectral Line-Indices: For each key spectral feature, an index is computed using flux integrals across feature and continuum sidebands,

$I_{\rm feat} = \frac{\int_{\lambda_a}^{\lambda_b} [1 - F(\lambda)/F_{\rm cont}]\, d\lambda}{\Delta\lambda_{\rm feat}}\,,$

with $F_{\rm cont}$ derived from linear interpolation of medians in adjacent continuum bands.

Heuristic Decision Rules: Thresholded line ratios (e.g., $\mathrm{Ca\,II\,K/H}\delta$ for A-stars) are used as criteria for spectral subclass or luminosity class assignment. For ambiguous or conflicting criteria, the inference engine iteratively brackets between standards, closely mimicking the decision protocols of human classifiers.

Inline peculiarity detection evaluates overabundances for elements such as Si, Sr, Cr, and Eu. For each element $X$ , the normalized deviation is calculated:

$R_X = \frac{I_X - \mu_{X,\rm std}}{\sigma_{X,\rm std}}\,,$

flagging a star as peculiar if $R_X > +2$ , and combining excesses via rule tables (e.g., identifying “SrEu” stars if both Sr and Eu indices are high).

3. Performance Metrics and Benchmarks

Evaluation of MKCLASS and NutMaat centers on subclass and luminosity-class offsets with respect to published types and human annotations. Metrics include the mean and standard deviation of offsets:

Spectral subclass offset:

$\Delta_{\rm spt,i} = \mathrm{type}_{\rm NutMaat,i} - \mathrm{type}_{\rm ref,i}$

with mean $\overline{\Delta}$ and dispersion $\sigma$ calculated across test samples.

Luminosity class offset uses analogous definitions on integer-coding (I = –2, V = +2).

Classification quality is further indexed via S/N-dependent quality flags: “Excel” (S/N > 100), "Vgood" (50 < S/N ≤ 100), "Good" (20 < S/N ≤ 50), "Fair" (5 < S/N ≤ 20).

Benchmarking on CFLIB (1,043 spectra) and MILES (599 spectra) libraries, NutMaat achieved spectral-type scatter $\sigma_{\rm spt} = 2.79$ subclasses ( $\overline{\Delta}_{\rm spt} = -0.25 \pm 0.07$ ), closely matching MKCLASS ( $\sigma = 2.73$ , $\overline{\Delta} = -0.20 \pm 0.07$ ). Luminosity class scatter was $\sigma_{\rm lum} = 0.92$ ( $\overline{\Delta}_{\rm lum} = +0.12 \pm 0.02$ ), identical to MKCLASS. Classification confusion is typically limited to adjacent subtypes.

For chemically peculiar stars, tested on 16 $\alpha^2$ CVn variables from LAMOST DR7, NutMaat correctly flagged Sr, Eu, and Si overabundances in ~14/16 cases, occasionally collapsing multi-element tags for simplicity.

Large scale application was demonstrated on SDSS-IV MaStar (DR17), processing ~10,000 visits in ~4 hr (16 cores), producing catalogs with MaNGA ID, coordinates, S/N, spectral type, quality, and $\chi^2$ .

4. Operational Workflow and Practical Integration

The batch-processing orientation of NutMaat facilitates direct integration into survey pipelines. A typical workflow comprises:

Loading spectra as pandas DataFrames with ‘wave’ and ‘flux’ fields.
Initializing the NutMaat classifier with an MK library.
Executing batch classification with configurable parameters (radial-velocity correction, normalization, number of iterations).
Receiving tabular output including spectral type, luminosity class, classification quality, $\chi^2$ , and peculiarity flags.
Storing or merging output with survey metadata for catalog creation.

Example code snippet:

import pandas as pd
from nutmaat import NutMaat, SpectrumBatch

df_spec = pd.read_csv('my_spectra.csv')
nm = NutMaat(library='libnor36')
results_df = nm.classify_batch(
    df_spec,
    rv_correction=True,
    normalize=True,
    iterations=3
)
print(results_df.head())
results_df.to_csv('classification_results.csv', index=False)

Typical runtime is 7–9 seconds per spectrum (three iterations) on a mid-range laptop, dominated by Python I/O and NumPy vectorized operations.

5. Limitations and Prospective Development

Limitations of the system, many inherited from MKCLASS, center on spectral coverage and non-canonical stellar types. Classification logic is optimized for canonical MK types in the 3800–5600 Å range. O-type, Wolf–Rayet, carbon, and white dwarf stars are handled rudimentarily. The reliance on Python introduces computational overhead compared to C-native implementations.

Planned enhancements to NutMaat include:

Wavelength extension into the red/infrared for broader applicability.
Enhanced O-type/exotic standard libraries.
Cython-based kernels for accelerated fitting and convolution.
Zero-overhead storage of MK standards as DataFrames.
Refined peculiarity logic, potentially through fuzzy logic and multi-element correlation mechanisms.

6. Context, Adoption, and Scientific Significance

The design philosophy of MKCLASS and its modern implementation in NutMaat is to bridge traditional human-expert classification with scalable, reproducible data-centric workflows demanded by contemporary surveys. By providing classification accuracies on par with human experts and legacy MKCLASS systems across diverse, benchmarked datasets—including robust performance down to S/N ≈ 5—these tools enable survey readiness for current and future spectroscopic missions.

This suggests that continued development along these lines will be critical for automated spectral typing in the context of data-intensive astronomical research, fostering reproducibility, transparency, and integration with evolving computational ecosystems. The modular, OS-independent, and batch-oriented design meets pressing demands for high-throughput, automated analysis, setting a reference for future stellar classification software architectures.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to MKCLASS Automated Classification Tool.