Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DCASE 2025: First-Shot Anomalous Sound Detection

Updated 30 June 2025
  • DCASE 2025 Task 2 is an unsupervised approach that requires anomaly detection using a single section of normal machine data, enforcing immediate model generalization.
  • Researchers employ domain generalization techniques, such as autoencoders and selective Mahalanobis distances, to validate performance on unseen machine types.
  • The task drives innovative, plug-and-play solutions for machine condition monitoring, advancing predictive maintenance in dynamic Industry 4.0 environments.

DCASE 2025 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2025 Challenge Task 2 addresses the development of unsupervised anomalous sound detection (ASD) systems intended for rapid deployment in machine condition monitoring, particularly in industrial scenarios where new or novel machine types must be monitored with little or no prior data beyond normal operational recordings. The central innovation is the “first-shot” learning setup, requiring methods to generalize instantly to previously unseen machine types without machine-specific tuning, thus reflecting demanding real-world constraints in Industry 4.0 and predictive maintenance.

1. Task Objective and Motivation

The aim is to benchmark ASD models that robustly and efficiently identify mechanical failures in novel machines under a first-shot learning paradigm. Systems are provided only a single section of normal operational data per newly encountered machine type for training, disallowing any form of machine-specific hyperparameter optimization or adaptation. Anomalous sounds are not available a priori, and development and evaluation occur strictly on different machine types, enforcing generalization.

Such a scenario models realistic constraints: new machinery often arrives with no pre-existing anomalous examples, collection of vast or labeled data is impractical, and rapid roll-out is required for cost efficiency and safety. By imposing first-shot learning, the challenge excludes bespoke tuning, thus the ASD systems must exhibit true transferability and robustness across rapidly shifting operational domains.

2. Domain Generalization Protocol

Domain generalization is a foundational assumption and methodology in this task. It addresses the non-stationarity often observed in machine audio recordings due to varying operating environments, microphone positions, and background noises. The challenge framework formalizes ASD as an out-of-distribution generalization problem: all evaluation machine types, conditions, and sections are unseen during model development, and domain labels are concealed at test time. A single global decision threshold must govern all operating points.

This constraint is designed to discourage overfitting to specific domains or prior test distributions and to promote the discovery of universal, domain-invariant representations of “normal” machine sounds. Systems must reliably distinguish “normal” from “anomalous” audio, even when encountering entirely new mechanical signatures or acoustic conditions.

3. Dataset Structure and Problem Setting

The datasets underpinning Task 2 are organized as follows:

  • Development Dataset: Includes seven machine types (fan, gearbox, bearing, slide rail, valve, ToyCar, ToyTrain) with only one section per type for training and evaluation.
  • Additional Training Dataset: Provides nine further machine types (e.g., AutoTrash, HomeCamera, ToyPet), supporting domain generalization research and pre-challenge experimentation.
  • Evaluation Dataset: Consists of these nine new machine types, strictly separated from development data, forming the basis for challenge leaderboard evaluation.

Each machine type and section contains:

  • 990 normal “source domain” training clips,
  • 10 normal “target domain” clips per section,
  • 100 clean or noise-only supplementary clips per section (to reflect practical availability of background noise or idle machine recordings),
  • For development: 100 normal and 100 anomalous clips per domain for evaluation,
  • For evaluation: 200 unlabeled test clips per section,
  • Recordings are 6–10 seconds, single-channel, 16kHz, with both lab- and field-mixed soundscapes.

The task enforces that only one section per machine type is available, directly simulating scenarios in which only a single, operational instance of a machine exists in a facility.

4. First-Shot Learning and its Significance

The first-shot approach is formally defined as ASD training or adaptation given only one section of normal data per newly encountered machine type, with no opportunity to view or infer characteristics of test-time domains before deployment. No tuning that involves the evaluation dataset or its statistical characteristics is permitted. This setting is highly restrictive and forces reliance on generalizable representations learned from normal sounds of disjoint machine types.

This protocol is distinct from zero-shot and few-shot learning in its industrial grounding: it assumes a single, readily collectable data section (as would be acquired upon commissioning), with rapid deployment and no test-time adaptation. The approach discourages any creeping of domain or attribute leakage—a critical point for model assessment and industrial trustworthiness.

5. Baseline Algorithms and Core Decision Criteria

The task provides a baseline autoencoder-based algorithm leveraging log-mel-spectrogram features:

  • Simple Autoencoder Mode: Trains an autoencoder on 5-frame concatenated segments (feature dimension D=64D = 64). The anomaly score Aθ(X)A_{\theta}(X) for an input sequence XX is computed as the mean squared error (MSE) between input and reconstructed features:

Aθ(X)=1DKk=1Kψkrθ(ψk)22A_{\theta}(X) = \frac{1}{DK} \sum_{k = 1}^K \| \psi_k - r_{\theta}(\psi_k) \|_{2}^{2}

where K=TP+1K = T-P+1 with TT the number of frames and P=5P=5 the segment length.

  • Selective Mahalanobis Mode: Computes the anomaly score using the minimum Mahalanobis distance over source or target domain statistics:

Aθ(X)=1DKk=1Kmin{Ds(ψk,rθ(ψk)),Dt(ψk,rθ(ψk))}A_{\theta}(X) = \frac{1}{DK} \sum_{k = 1}^K \min\{ D_s (\psi_k, r_{\theta}(\psi_k)), D_t (\psi_k, r_{\theta}(\psi_k))\}

where DsD_s and DtD_t are Mahalanobis distances computed with covariance matrices from source and target domain residuals.

  • Decision Thresholding: A single anomaly detection threshold ϕ\phi partitions input xx into Anomaly (if Aθ(x)>ϕA_{\theta}(x) > \phi) or Normal (otherwise).

The task does not prescribe network structure beyond what is provided in the baseline. Research directions are expected to pursue robust, lightweight unsupervised detectors and alternatives to standard AEs, provided the first-shot and domain constraints are obeyed.

6. Evaluation Metrics

Task 2 employs metrics standard to the DCASE ASD research line:

  • AUC (Area Under ROC Curve): Measures discrimination ability between normal and anomalous classes aggregated across machine types, domains, and sections.
  • pAUC (Partial AUC): Focuses discrimination performance at low false positive rates (FPR between 0 and p=0.1p=0.1).
  • Official Score Ω\Omega: The harmonic mean of AUC and pAUC, reporting balanced efficacy across both overall and regionally critical thresholds.

Evaluation requires that anomaly scores and operating points be selected globally, not per machine, thus reflecting real field deployment conditions.

7. Perspectives and Future Directions

The DCASE 2025 Task 2 organization highlights several research avenues:

  • Supplementary Data Utilization: Submission protocols encourage, but do not require, methods that leverage auxiliary clean machine or noise-only data—relevant for leveraging idle or background recordings that are easy to collect in industrial settings.
  • Model and Computational Efficiency: Reporting of multiply-accumulate (MAC) counts is recommended, reflecting an emphasis on real-time, edge-AI deployment feasibility.
  • Robustness to Attribute-Free Scenarios: Since not all machines possess detailed operational metadata, algorithms are expected to function in attribute-agnostic mode, relying on audio alone for anomaly inference.
  • Plug-and-Play ASD: The culminating goal is systems that can be deployed “as-is” with no further adaptation, fulfilling the plug-and-play vision for pervasive, reliable machine condition monitoring.

This challenge extends and refines foundational ideas from previous DCASE ASD tasks, placing notable emphasis on immediate generalization to unseen domains and machines, mirroring real-world constraints where anomalous data are rare and operational delays are costly.


Summary Table: DCASE 2025 Challenge Task 2—Key Properties

Aspect Specification
Task regime First-shot, unsupervised, domain generalization
Data per machine One section, normal only; no anomalous training data
Evaluation protocol Machine-unseen domains, global threshold, no tuning
Metrics AUC, pAUC, official score Ω\Omega
Goal Rapid ASD deployment, robustness, and edge applicability

This summary reflects the technical and methodological details defined in the description of DCASE 2025 Challenge Task 2: First-shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring.