Zero-Label Cross-System Anomaly Detection

Updated 15 November 2025

The paper introduces a zero-label anomaly detection framework that transfers and adapts knowledge from labeled source logs to unlabeled target logs using adversarial domain alignment and meta-learning.
It employs semantic embeddings, routing architectures, and pseudo-labeling strategies to overcome proprietary log challenges, achieving high F1 scores in cross-system transfers.
This approach minimizes manual annotation costs while offering a scalable solution for real-world log analytics through unsupervised and self-supervised learning techniques.

Zero-label cross-system log-based anomaly detection refers to the family of methodologies enabling the detection of anomalous events in the logs of a “target” software system using no labeled (ground-truth) anomalies from that system, but instead leveraging labeled logs and/or generalizable knowledge from one or more distinct “source” systems. The fundamental problem arises from domain-specific variability in log syntax and semantics, proprietary operational patterns, and the extremely high cost or infeasibility of manual annotation on new systems. Recent work has established this setting as both a theoretical and practical benchmark for transfer learning, representation meta-learning, and knowledge fusion in machine log analytics.

1. Problem Statement and Motivation

Formally, the task provides (i) a labeled source log dataset $\mathcal{D}_S = \{ (x_i, y_i) \}$ from one or more mature systems, where $y_i \in \{0,1\}$ indicate normal vs. anomalous, and (ii) a target domain log stream $\mathcal{D}_T = \{ x_j \}$ , fully unlabeled. The objective is to construct a function $f_T(\cdot)$ that predicts anomaly status for entries in $\mathcal{D}_T$ without ever consulting target system supervision. This “zero-label” constraint is strictly more difficult than classical transfer or domain adaptation settings.

The central challenge is that many log-based anomalies are system-specific (“proprietary” with respect to formats, codes, or event sequences), while others follow cross-system operational patterns. Existing few-label learning and transfer techniques are vulnerable to the “cold start” problem: lacking any target-labeled anomalies, they fail to adapt to proprietary behaviors and often exhibit low recall (Zhao et al., 26 Jul 2025, Zhao et al., 8 Nov 2025). Unsupervised approaches must avoid overfitting to the idiosyncrasies of the source domain and generalize across possibly disjoint template spaces (Zhao et al., 8 Nov 2025, Zhao et al., 8 Nov 2025).

2. Representation Learning and Domain Alignment

Zero-label generalization mandates learning log representations that are both discriminative of anomalies and agnostic to system-specific noise. A common workflow initiates from log preprocessing—parsing raw lines via Drain or similar, substituting system-dependent fields with “<*>” placeholders, and extracting event templates (Wang et al., 18 Sep 2024, Zhao et al., 8 Nov 2025, Zhao et al., 8 Nov 2025). These templates are then embedded in a global or shared vector space via pretrained LLMs (e.g., BERT, GloVe), which allows semantic comparisons across domains (Zhao et al., 8 Nov 2025, Wang et al., 18 Sep 2024).

Adversarial domain alignment is employed to enforce system invariance: a domain discriminator $D$ is jointly trained with a feature extractor $E$ to maximize the indistinguishability of source vs. target representations. The latter is adversarially updated to minimize this distinction, typically via gradients from a small MLP domain classifier (Zhao et al., 8 Nov 2025, Zhao et al., 26 Jul 2025). This process is typically combined with supervised anomaly classification heads trained on source data—mathematically:

$\min_{\theta_e, \theta_\omega} \max_{\theta_d} \sum_{MT_i} \left( \gamma \mathcal{L}_c^{MT_i} + \beta \mathcal{L}_{ad}^{MT_i} \right)$

where $\mathcal{L}_c$ is supervised cross-entropy on source labels and $\mathcal{L}_{ad}$ is adversarial loss distinguishing source/target. Meta-learning over cross-system “tasks” (splits of source and unlabeled target) produces a parameter initialization that supports rapid domain adaptation in one or few gradient steps (Zhao et al., 8 Nov 2025, Zhao et al., 26 Jul 2025).

3. Knowledge Fusion and Routing Architectures

Recent advances center on the explicit fusion of general (cross-system) and proprietary (target-specific) anomaly knowledge. Seminal works such as FusionLog (Zhao et al., 8 Nov 2025) and GeneralLog (Zhao et al., 8 Nov 2025) introduce a “knowledge-level” semantic router: for each target log sequence, event embeddings are computed and sequence-level similarity $S(x) = \min_i \max_j~\mathrm{cosine}(v_i, u_j)$ is measured against source event prototypes. A threshold $\tau$ partitions sequences into:

“General logs”: sufficiently similar to source prototypes, predicted by a fast, meta-learned small model.
“Proprietary logs”: insufficiently similar; handled via more computationally intensive mechanisms.

FusionLog subsequently applies multi-round collaborative pseudo-labeling in the proprietary bucket, generating pseudo-labels by combining outputs from a LLM (LLM, e.g., Qwen3) with the small model (SM); matching cases with high SM confidence constitute a “clean” set for iterative SM fine-tuning. This distillation sharpens the small model’s ability to recognize proprietary anomalies, reducing reliance on LLM inference (Zhao et al., 8 Nov 2025). GeneralLog uses an LLM+retrieval-augmented generation (RAG) architecture only for proprietary logs, further reducing inference cost by offloading high-throughput detection to the small model (Zhao et al., 8 Nov 2025).

This architectural principle avoids the pitfalls of uncertainty-based routing—which lacks interpretability in the zero-label setting and may over-defer to LLMs. Instead, semantic similarity operationally aligns knowledge buckets to the actual distributional overlap of event types between source and target.

4. Unsupervised and Self-Supervised Approaches

Several frameworks achieve zero-label generalization through self-supervised or fully unsupervised objectives. ADALog (Pospieszny et al., 15 May 2025) dispenses with log parsing and instead fine-tunes a BERT-based transformer on normal logs only, using a masked language modeling (MLM) objective:

$\mathcal{L}_{MLM}(\theta) = -\sum_{l \in \mathcal{D}_{\text{normal}}} \sum_{i \in M} \log P_\theta(w_i | C_i)$

At inference, token-level reconstruction probabilities serve as anomaly scores, with event-level scores aggregating via negative log-likelihood. Adaptive thresholding via percentiles on the normal score distribution replaces hand-tuned thresholds. As no labels or event templates are required, this methodology avoids domain-specific overfitting while achieving competitive F1 on benchmarks (Pospieszny et al., 15 May 2025).

Other variants, including Log2graphs (Wang et al., 18 Sep 2024), integrate log content and structural dependencies by constructing per-window log graphs, extracting embeddings via dual graph convolutional autoencoders (content and causal structure GCN modules), and applying spectral clustering over embedding affinities to partition windows into “normal” and “anomalous” in a fully unsupervised fashion—demonstrating strong transfer to unseen target distributions.

5. Meta-Learning, Transfer, and Evaluation Protocols

Multiple meta-learning approaches have emerged to address system heterogeneity and the lack of target labels. ZeroLog (Zhao et al., 8 Nov 2025) and FreeLog (Zhao et al., 26 Jul 2025) combine domain-adversarial training with meta-task optimization, simulating adaptation to new domains by constructing support/query tasks mixing source-annotated and unlabeled target samples. Mathematically, inner-loop adaptation combines source classification and domain-adversarial gradients, with a meta-objective minimizing new domain generalization loss after adaptation. This enables deployment of a fixed encoder/classifier pair directly to the target, decoupling detection from any domain-specific annotation.

Performance is evaluated under cross-system zero-label transfers (e.g., HDFS→BGL, OpenStack→HDFS). Metrics include precision, recall, F1, and—where applicable—unsupervised clustering indices (silhouette, Davies–Bouldin, Calinski–Harabasz). Strong empirical results are established: FusionLog and GeneralLog exceed 90% F1, surpassing all prior transfer-learning and domain-adversarial baselines by 10–15 percentage points (Zhao et al., 8 Nov 2025, Zhao et al., 8 Nov 2025). ZeroLog and FreeLog reliably achieve F1 > 80% across all evaluated transfers (Zhao et al., 8 Nov 2025, Zhao et al., 26 Jul 2025); ADALog matches or outperforms both supervised and unsupervised baselines without requiring any anomaly labels or log parsing (Pospieszny et al., 15 May 2025).

6. Specialized Knowledge Sources and Instruction Mining

An alternative approach leverages external knowledge sources for bootstrapping anomaly representations. ADLILog (Bogatinovski et al., 2022) mines millions of log instructions (static log messages and developer-annotated severities from 1 000+ public code repositories), then pretrains a transformer encoder to distinguish “normal” (INFO-level) from “abnormal” (ERROR/FATAL/CRITICAL) message templates. The encoder is then transferred unchanged to the target, with only a lightweight head finetuned—using unlabeled target logs as presumed normal and abnormal instructions as positive outliers. Hyperspherical loss encourages normal embeddings near the origin and abnormal ones far away. Inference scores are simple Euclidean distance metrics; optimal thresholds are tuned on a validation set. This results in cross-system F1 up to 0.98 (HDFS), showing that large-scale instruction mining can provide generalizable anomaly patterns when target annotation is unavailable.

7. Limitations, Ablation Findings, and Future Directions

Current zero-label cross-system methods still trail—by several F1 points—the performance of fully supervised in-domain models, especially when operational anomalies in the target system are highly proprietary and absent from source anomaly distributions (Zhao et al., 8 Nov 2025, Zhao et al., 8 Nov 2025). Over- or under-weighting the strength of domain/adversarial gradients degrades either generalization or anomaly separation. Success of knowledge fusion and pseudo-label refinement for proprietary logs is contingent upon sufficient source–target event overlap and effective semantic embedding alignment. Another limitation noted is the computational expense of LLM-based inference, motivating progressive distillation or hybrid expert routing to minimize cost.

Research directions include dynamic, data-driven routing architectures, theoretical generalization bounds for adversarial meta-learning, multi-modal and continual adaptation strategies, and operational integration of human-in-the-loop feedback for anomaly label refinement. Extending these frameworks to fully unseen log formats or systems with extremely sparse unlabeled data remains an open challenge.