IdentYS Tools: ASR and YSO Analysis

Updated 19 November 2025

IdentYS is a dual-purpose toolkit comprising a speech processing system for joint diarization, identification, and ASR, and an infrared pipeline for automated young stellar object detection.
IdentYS-ASR leverages a modular, microservice-based architecture with neural diarization and embedding extraction to produce speaker-attributed transcripts with high accuracy.
IdentYS-YSO utilizes multi-band IR photometry and color–color diagram analysis to robustly classify YSO candidates in extensive star-forming regions.

IdentYS refers to two independent research tools, each developed for a specialized domain: (1) a modular, microservice-based toolkit for joint speaker diarization, speaker identification, and automatic speech recognition (ASR) in spoken language processing; and (2) a Python-based pipeline for the identification of young stellar objects (YSOs) in star-forming regions using multi-band infrared (IR) photometry. Both systems are designed for robust, high-throughput statistical analysis in large data regimes, but their methodologies, architectures, and intended scientific applications differ fundamentally. The following article presents an exhaustive account of both versions, IdentYS-ASR and IdentYS-YSO, as described in (Morrone et al., 2024) and (Nikoghosyan et al., 16 Nov 2025), respectively.

1. System Overviews and Scientific Context

IdentYS-ASR: Joint Speaker Diarization, Identification, and ASR

IdentYS-ASR is a modular toolkit designed to deliver joint speaker diarization (SD), speaker identification (SI), and automatic speech recognition (ASR) with speaker-attributed transcripts for audio and video resources. The system targets operational robustness in diverse application domains—including media monitoring and institutional speech analytics—through flexible, configuration-driven pipelines. The architecture orchestrates multiple models for SD and SI, supports closed- and open-set identification, and integrates contemporary neural diarization algorithms with configurable ASR engines. Its web-based frontend, FlyScribe, enables streamlined user interaction for configuration selection, data upload, transcript correction, and multi-format export (Morrone et al., 2024).

IdentYS-YSO: Automated Identification of Young Stars via IR Photometry

IdentYS-YSO enables the selection of Class I and II YSOs in deeply embedded or remote star-forming regions by exploiting the IR excess signature produced by warm circumstellar dust. The tool automates the acquisition, homogenization, cross-matching, and photometric filtering of large multi-band datasets from surveys such as UKIDSS, VVV, 2MASS, Spitzer, and WISE, deploying established colour–colour diagram boundaries to separate YSOs from field contaminants. The architecture supports batch processing via a command-line interface, emphasizes catalogue completeness, and outputs stage classifications with photometric and astrometric detail for large stellar populations (Nikoghosyan et al., 16 Nov 2025).

2. Core Algorithms, Models, and Classification Criteria

IdentYS-ASR: Embedding Extraction, Diarization, and Identification

Speaker Embeddings: Employs both legacy i-vectors and neural x-vectors or ECAPA-TDNN embeddings via the Wespeaker library. Given a speech window $X$ , an embedding $e = f(X) \in \mathbb{R}^D$ is computed, with enrollment vectors for each registered speaker.
Hybrid EEND-Vector Clustering: Utilizes a neural End-to-End Neural Diarization (EEND) block to derive per-frame speaker posteriors $P_{t,s}$ . Confident segments are embedded and clustered by agglomerative hierarchical clustering (AHC) with PLDA-based affinity scores:

$\text{Score}(e_i, e_j) = e_i^T \Lambda e_j + e_i^T \Gamma e_i + e_j^T \Gamma e_j$

Clustering proceeds until a threshold $\tau_{\mathrm{SD}}$ is met, yielding time-labeled speaker clusters.

Speaker Identification: Segment embeddings $e$ are compared to enrollment vectors $e_k^{\text{ref}}$ using cosine or PLDA scoring:

$s_k = \cos(e, e_k^{\text{ref}})\quad\text{or}\quad s_k = \text{Score}(e, e_k^{\text{ref}})$

Closed-set mode assigns $\hat{k} = \arg\max_k s_k$ ; open-set identification uses a threshold $\tau_{\mathrm{SI}}$ (e.g. 0.5 for cosine), with sub-threshold cases labeled as 'Unknown $_i$ '.

IdentYS-YSO: Infrared Color–Color Selection, Contaminant Exclusion

Diagnostic Diagrams: Five primary color–color spaces are analyzed:
1. $(J-H)$ vs. $(H-K)$ (NIR), with excess region: $(J-H) > 1.70 (H-K) + 0.20$
2. $K - [3.6]$ vs. $[3.6] - [4.5]$ (empirical boundaries from Gutermuth et al.)
3. $[3.6] - [4.5]$ vs. $[5.8] - [8.0]$ (Class I/II protostar domains)
4. $[3.6] - [5.8]$ vs. $[8.0] - [24]$ (disk/envelope diagnostics)
5. $W1-W2$ vs. $W2-W3$ (WISE; with Koenig & Leisawitz 2012/2014 boundaries)
Contaminant Filtering: Excludes AGB stars, PAH-rich galaxies, and AGNs using color/magnitude criteria:

$[4.5]>7.8,\, [8.0]-[24]<2.5$

and, e.g.,

$[3.6]-[5.8]<1.5, \quad [4.5]-[8.0]>1.0$

Candidate Selection: A final YSO candidate must show IR excess in at least two diagrams. Majority voting across module flags (NIR, NMIR, MIR1, MIR2, WISE) assigns a final evolutionary stage (I, I/II, II).

3. Software Architecture, Configuration, and Workflow

IdentYS-ASR

The architecture is microservice-based, comprising the following key components:

FlyScribe Web Frontend: React-based GUI enables configuration setup selection (e.g., media monitoring), data upload (WAV, MP4, MOV, etc.), result visualization/correction, and export (SRT, JSON).
Audioma Orchestrator: YAML/JSON-driven pipeline manager, parallelizes SAM (VAD, SD, SI) and ASR engines, and fuses results.
Speaker Analysis Microservice (SAM): gRPC API exposes endpoints for VAD, SD, and SI; accepts configuration for different processing modes and speaker models.
Fusion Module: Assigns per-word speaker labels by maximizing time-overlap between words and diarization segments.

A single configuration file specifies all module choices, thresholds, enrollment references, and ASR models. Domain adaptation is enabled via configuration switching.

IdentYS-YSO

Programming Language: Python 3, with core dependencies on astropy, numpy, pandas, and astroquery (for VizieR catalog access).
Workflow:

Area photometry from UKIDSS/VVV, 2MASS, Spitzer, WISE (via VOTable/CSV).
Survey-specific quality cuts, magnitude system harmonization.
Positional cross-matching (within 3 $\sigma$ combined uncertainty).
Calculation of relevant colors.
Empirical exclusion of known contaminants.
IR excess flagging in five color–color diagrams.
Final candidate output with class flags.

User Interface: Command-line with YAML/CLI configuration for coordinates, radius, survey selection, classification scheme (WISE: K12 or K14), and output location.

4. Output, Performance, and Use Cases

IdentYS-ASR

Speaker-Attributed Transcript: Output comprises time-labeled turns:

1 2	[00:00:00.00–00:00:05.20] Alice: “Good morning everyone…” [00:00:05.20–00:00:07.80] Bob: “Morning Alice, how are you?”

Formats: Synchronized transcript export in SRT, JSON, or text.
Performance: On 2 hours of meeting audio, internal testing yields DER ≈ 6.5% (with overlap), SI accuracy $>94\%$ (open-set, $N=50$ ), and ASR WER ≈ 10%. End-to-end latency on CPU (i7-9800X): $0.18\times$ real time.

IdentYS-YSO

Catalogues: CSV catalogues of YSO candidates with photometry, astrometry, color flags, and final class. Also outputs “field” (non-YSO) sources for post-analysis.
Case Study: For GRSMC 045.49+00.04 ( $D\approx8$ kpc, $A_V\approx11$ mag), $\sim$ 140,000 starting sources $\rightarrow$ 2,000 YSO candidates exceeding the two-diagram IR excess threshold (111 Class I, 1,469 Class I/II, 376 Class II).
Validation: Over 90% of candidates lie to the right of the ZAMS locus in CMDs (PARSEC, $10^7$ yr); spectral index $\alpha$ (3.6–8.0 μm) consistent with class assignments.

5. Practical Deployment and User Interaction

IdentYS-ASR

Domain Switching: All model and threshold settings are selected via configuration profiles for different application domains.
Client Experience: Web-based GUI abstracts processing complexity; manual speaker correction and transcript editing are built-in.
Integration: All compute-intensive steps (embedding extraction, clustering, ASR) occur server-side, with no local installation required for end users.

IdentYS-YSO

Installation: Standard Git and pip-based install, with YAML configuration for automated pipeline setup.
Interface: Command-line supports fine-grained run customization (center/coordinates, radius, survey choice, classification boundaries).
Batch Processing: Scales to $~10^5$ sources per region, suitable for large survey fields and high-extinction environments.

Tool Variant	Domain	Core Function	Interface/Output
IdentYS-ASR	Spoken Language	Speaker diarization, ID, ASR	Web GUI, SRT, JSON transcript
IdentYS-YSO	Star/IR Astronomy	YSO candidate selection	CLI, YAML, CSV catalogues

6. Limitations and Future Directions

IdentYS-ASR: Current evaluation pending for broader acoustic/language conditions. Accuracy contingent on quality and representativeness of enrolled speaker embeddings, cluster thresholds, and selected ASR engine. Adaptability to new domains is mediated by configuration but ultimately depends on the suitability of individual backend models (Morrone et al., 2024).
IdentYS-YSO: Currently excludes Class III (diskless) YSOs absent significant IR excess—future extensions may incorporate X-ray or Hα data, Gaia astrometry for kinematic decontamination ( $D\lesssim5$ kpc), and advanced ML techniques for SED fitting. Extreme crowding, source blending, or limited survey depth may reduce candidate purity. ALMA/JWST data could extend capabilities to the Class 0 phase (Nikoghosyan et al., 16 Nov 2025).

7. Significance and Contextualization within Their Fields

Both IdentYS toolkits exemplify domain-specific automation of expert-driven classification tasks using modern data science and signal processing techniques. IdentYS-ASR operationalizes state-of-the-art neural diarization and speaker representation models within a pluggable, user-driven workflow for large-scale spoken media analytics. IdentYS-YSO enables rapid, statistically robust YSO population studies by codifying empirical photometric selection criteria into reproducible algorithms, directly supporting cluster formation theory and the mapping of star-forming environments.

A commonality is their emphasis on modularity, transparent configuration, and pipeline reproducibility, making them amenable to adaptation for new problem instances within their respective data and use domains.

PDF Markdown Chat (Pro)

References (2)

A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR (2024)

IdentYS: A Python-Based Tool for Identifying Young Stars in Star-Forming Regions (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to IdentYS Tool.

IdentYS Tools: ASR and YSO Analysis

1. System Overviews and Scientific Context

IdentYS-ASR: Joint Speaker Diarization, Identification, and ASR

IdentYS-YSO: Automated Identification of Young Stars via IR Photometry

2. Core Algorithms, Models, and Classification Criteria

IdentYS-ASR: Embedding Extraction, Diarization, and Identification

IdentYS-YSO: Infrared Color–Color Selection, Contaminant Exclusion

3. Software Architecture, Configuration, and Workflow

IdentYS-ASR

IdentYS-YSO

4. Output, Performance, and Use Cases

IdentYS-ASR

IdentYS-YSO

5. Practical Deployment and User Interaction

IdentYS-ASR

IdentYS-YSO

6. Limitations and Future Directions

7. Significance and Contextualization within Their Fields

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics