Papers
Topics
Authors
Recent
2000 character limit reached

IdentYS Tools: ASR and YSO Analysis

Updated 19 November 2025
  • IdentYS is a dual-purpose toolkit comprising a speech processing system for joint diarization, identification, and ASR, and an infrared pipeline for automated young stellar object detection.
  • IdentYS-ASR leverages a modular, microservice-based architecture with neural diarization and embedding extraction to produce speaker-attributed transcripts with high accuracy.
  • IdentYS-YSO utilizes multi-band IR photometry and color–color diagram analysis to robustly classify YSO candidates in extensive star-forming regions.

IdentYS refers to two independent research tools, each developed for a specialized domain: (1) a modular, microservice-based toolkit for joint speaker diarization, speaker identification, and automatic speech recognition (ASR) in spoken language processing; and (2) a Python-based pipeline for the identification of young stellar objects (YSOs) in star-forming regions using multi-band infrared (IR) photometry. Both systems are designed for robust, high-throughput statistical analysis in large data regimes, but their methodologies, architectures, and intended scientific applications differ fundamentally. The following article presents an exhaustive account of both versions, IdentYS-ASR and IdentYS-YSO, as described in (Morrone et al., 9 Sep 2024) and (Nikoghosyan et al., 16 Nov 2025), respectively.

1. System Overviews and Scientific Context

IdentYS-ASR: Joint Speaker Diarization, Identification, and ASR

IdentYS-ASR is a modular toolkit designed to deliver joint speaker diarization (SD), speaker identification (SI), and automatic speech recognition (ASR) with speaker-attributed transcripts for audio and video resources. The system targets operational robustness in diverse application domains—including media monitoring and institutional speech analytics—through flexible, configuration-driven pipelines. The architecture orchestrates multiple models for SD and SI, supports closed- and open-set identification, and integrates contemporary neural diarization algorithms with configurable ASR engines. Its web-based frontend, FlyScribe, enables streamlined user interaction for configuration selection, data upload, transcript correction, and multi-format export (Morrone et al., 9 Sep 2024).

IdentYS-YSO: Automated Identification of Young Stars via IR Photometry

IdentYS-YSO enables the selection of Class I and II YSOs in deeply embedded or remote star-forming regions by exploiting the IR excess signature produced by warm circumstellar dust. The tool automates the acquisition, homogenization, cross-matching, and photometric filtering of large multi-band datasets from surveys such as UKIDSS, VVV, 2MASS, Spitzer, and WISE, deploying established colour–colour diagram boundaries to separate YSOs from field contaminants. The architecture supports batch processing via a command-line interface, emphasizes catalogue completeness, and outputs stage classifications with photometric and astrometric detail for large stellar populations (Nikoghosyan et al., 16 Nov 2025).

2. Core Algorithms, Models, and Classification Criteria

IdentYS-ASR: Embedding Extraction, Diarization, and Identification

  • Speaker Embeddings: Employs both legacy i-vectors and neural x-vectors or ECAPA-TDNN embeddings via the Wespeaker library. Given a speech window XX, an embedding e=f(X)RDe = f(X) \in \mathbb{R}^D is computed, with enrollment vectors for each registered speaker.
  • Hybrid EEND-Vector Clustering: Utilizes a neural End-to-End Neural Diarization (EEND) block to derive per-frame speaker posteriors Pt,sP_{t,s}. Confident segments are embedded and clustered by agglomerative hierarchical clustering (AHC) with PLDA-based affinity scores:

Score(ei,ej)=eiTΛej+eiTΓei+ejTΓej\text{Score}(e_i, e_j) = e_i^T \Lambda e_j + e_i^T \Gamma e_i + e_j^T \Gamma e_j

Clustering proceeds until a threshold τSD\tau_{\mathrm{SD}} is met, yielding time-labeled speaker clusters.

  • Speaker Identification: Segment embeddings ee are compared to enrollment vectors ekrefe_k^{\text{ref}} using cosine or PLDA scoring:

sk=cos(e,ekref)orsk=Score(e,ekref)s_k = \cos(e, e_k^{\text{ref}})\quad\text{or}\quad s_k = \text{Score}(e, e_k^{\text{ref}})

Closed-set mode assigns k^=argmaxksk\hat{k} = \arg\max_k s_k; open-set identification uses a threshold τSI\tau_{\mathrm{SI}} (e.g. 0.5 for cosine), with sub-threshold cases labeled as 'Unknowni_i'.

IdentYS-YSO: Infrared Color–Color Selection, Contaminant Exclusion

  • Diagnostic Diagrams: Five primary color–color spaces are analyzed:

    1. (JH)(J-H) vs. (HK)(H-K) (NIR), with excess region: (JH)>1.70(HK)+0.20(J-H) > 1.70 (H-K) + 0.20
    2. K[3.6]K - [3.6] vs. [3.6][4.5][3.6] - [4.5] (empirical boundaries from Gutermuth et al.)
    3. [3.6][4.5][3.6] - [4.5] vs. [5.8][8.0][5.8] - [8.0] (Class I/II protostar domains)
    4. [3.6][5.8][3.6] - [5.8] vs. [8.0][24][8.0] - [24] (disk/envelope diagnostics)
    5. W1W2W1-W2 vs. W2W3W2-W3 (WISE; with Koenig & Leisawitz 2012/2014 boundaries)
  • Contaminant Filtering: Excludes AGB stars, PAH-rich galaxies, and AGNs using color/magnitude criteria:

[4.5]>7.8,[8.0][24]<2.5[4.5]>7.8,\, [8.0]-[24]<2.5

and, e.g.,

[3.6][5.8]<1.5,[4.5][8.0]>1.0[3.6]-[5.8]<1.5, \quad [4.5]-[8.0]>1.0

  • Candidate Selection: A final YSO candidate must show IR excess in at least two diagrams. Majority voting across module flags (NIR, NMIR, MIR1, MIR2, WISE) assigns a final evolutionary stage (I, I/II, II).

3. Software Architecture, Configuration, and Workflow

IdentYS-ASR

The architecture is microservice-based, comprising the following key components:

  • FlyScribe Web Frontend: React-based GUI enables configuration setup selection (e.g., media monitoring), data upload (WAV, MP4, MOV, etc.), result visualization/correction, and export (SRT, JSON).
  • Audioma Orchestrator: YAML/JSON-driven pipeline manager, parallelizes SAM (VAD, SD, SI) and ASR engines, and fuses results.
  • Speaker Analysis Microservice (SAM): gRPC API exposes endpoints for VAD, SD, and SI; accepts configuration for different processing modes and speaker models.
  • Fusion Module: Assigns per-word speaker labels by maximizing time-overlap between words and diarization segments.

A single configuration file specifies all module choices, thresholds, enrollment references, and ASR models. Domain adaptation is enabled via configuration switching.

IdentYS-YSO

  • Programming Language: Python 3, with core dependencies on astropy, numpy, pandas, and astroquery (for VizieR catalog access).
  • Workflow:
  1. Area photometry from UKIDSS/VVV, 2MASS, Spitzer, WISE (via VOTable/CSV).
  2. Survey-specific quality cuts, magnitude system harmonization.
  3. Positional cross-matching (within 3σ\sigma combined uncertainty).
  4. Calculation of relevant colors.
  5. Empirical exclusion of known contaminants.
  6. IR excess flagging in five color–color diagrams.
  7. Final candidate output with class flags.
  • User Interface: Command-line with YAML/CLI configuration for coordinates, radius, survey selection, classification scheme (WISE: K12 or K14), and output location.

4. Output, Performance, and Use Cases

IdentYS-ASR

  • Speaker-Attributed Transcript: Output comprises time-labeled turns:
    1
    2
    
    [00:00:00.00–00:00:05.20] Alice: “Good morning everyone…”
    [00:00:05.20–00:00:07.80] Bob:   “Morning Alice, how are you?”
  • Formats: Synchronized transcript export in SRT, JSON, or text.
  • Performance: On 2 hours of meeting audio, internal testing yields DER ≈ 6.5% (with overlap), SI accuracy >94%>94\% (open-set, N=50N=50), and ASR WER ≈ 10%. End-to-end latency on CPU (i7-9800X): 0.18×0.18\times real time.

IdentYS-YSO

  • Catalogues: CSV catalogues of YSO candidates with photometry, astrometry, color flags, and final class. Also outputs “field” (non-YSO) sources for post-analysis.
  • Case Study: For GRSMC 045.49+00.04 (D8D\approx8 kpc, AV11A_V\approx11 mag), \sim140,000 starting sources \rightarrow 2,000 YSO candidates exceeding the two-diagram IR excess threshold (111 Class I, 1,469 Class I/II, 376 Class II).
  • Validation: Over 90% of candidates lie to the right of the ZAMS locus in CMDs (PARSEC, 10710^7 yr); spectral index α\alpha (3.6–8.0 μm) consistent with class assignments.

5. Practical Deployment and User Interaction

IdentYS-ASR

  • Domain Switching: All model and threshold settings are selected via configuration profiles for different application domains.
  • Client Experience: Web-based GUI abstracts processing complexity; manual speaker correction and transcript editing are built-in.
  • Integration: All compute-intensive steps (embedding extraction, clustering, ASR) occur server-side, with no local installation required for end users.

IdentYS-YSO

  • Installation: Standard Git and pip-based install, with YAML configuration for automated pipeline setup.
  • Interface: Command-line supports fine-grained run customization (center/coordinates, radius, survey choice, classification boundaries).
  • Batch Processing: Scales to  105~10^5 sources per region, suitable for large survey fields and high-extinction environments.
Tool Variant Domain Core Function Interface/Output
IdentYS-ASR Spoken Language Speaker diarization, ID, ASR Web GUI, SRT, JSON transcript
IdentYS-YSO Star/IR Astronomy YSO candidate selection CLI, YAML, CSV catalogues

6. Limitations and Future Directions

  • IdentYS-ASR: Current evaluation pending for broader acoustic/language conditions. Accuracy contingent on quality and representativeness of enrolled speaker embeddings, cluster thresholds, and selected ASR engine. Adaptability to new domains is mediated by configuration but ultimately depends on the suitability of individual backend models (Morrone et al., 9 Sep 2024).
  • IdentYS-YSO: Currently excludes Class III (diskless) YSOs absent significant IR excess—future extensions may incorporate X-ray or Hα data, Gaia astrometry for kinematic decontamination (D5D\lesssim5 kpc), and advanced ML techniques for SED fitting. Extreme crowding, source blending, or limited survey depth may reduce candidate purity. ALMA/JWST data could extend capabilities to the Class 0 phase (Nikoghosyan et al., 16 Nov 2025).

7. Significance and Contextualization within Their Fields

Both IdentYS toolkits exemplify domain-specific automation of expert-driven classification tasks using modern data science and signal processing techniques. IdentYS-ASR operationalizes state-of-the-art neural diarization and speaker representation models within a pluggable, user-driven workflow for large-scale spoken media analytics. IdentYS-YSO enables rapid, statistically robust YSO population studies by codifying empirical photometric selection criteria into reproducible algorithms, directly supporting cluster formation theory and the mapping of star-forming environments.

A commonality is their emphasis on modularity, transparent configuration, and pipeline reproducibility, making them amenable to adaptation for new problem instances within their respective data and use domains.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to IdentYS Tool.