GeneralLog: Zero-Label Log Anomaly Detection

Updated 15 November 2025

GeneralLog is a collaborative framework for zero-label cross-system log anomaly detection that leverages semantic routing to partition logs effectively.
It uses a dynamic router to separate logs into general (shared) and proprietary (unique) sets, delegating inference to a meta-learned small model or LLM+RAG module.
GeneralLog achieves state-of-the-art F1 scores (>90%) on benchmark datasets, outperforming traditional methods in accuracy and cost in cold-start scenarios.

GeneralLog is a knowledge-level collaborative framework for zero-label cross-system log anomaly detection, addressing the challenge of robust anomaly detection in settings where the target system provides no labeled log data. Unlike prior small-model transfer or LLM approaches, GeneralLog introduces a dynamic semantic router that partitions unlabeled target-system logs into "general" (shared with source systems) and "proprietary" (unique to the target) subsets, then delegates inference to a meta-learned neural model or LLM+Retrieval-Augmented Generation (RAG) module respectively. This architecture achieves state-of-the-art F1 scores (>90%) under a full cold-start scenario on standard log datasets, providing a cost-effective, accurate solution in environments without labeled anomalies.

1. Problem Formulation and Limitations of Prior Art

The objective is to construct an anomaly detector $f:x\rightarrow \{0,1\}$ that performs well on an entirely unlabeled target-system log dataset $\mathcal{D}_t = \{x_j\}_{j=1}^{N_t}$ , using only labeled source-system logs $\mathcal{D}_s = \{(x_i, y_i)\}_{i=1}^{N_s}$ with $y_i \in \{\text{normal}, \text{anomaly}\}$ . This setting—zero-label cross-system transfer—differs substantively from conventional transfer learning and unsupervised anomaly detection.

Existing approaches face the following limitations:

Small-model transfer (e.g., GRU/LSTM with transfer/meta-learning) captures only invariances shared across systems, failing when the target contains templates with semantics or failure modes unseen in the source (target "proprietary logs").
LLM-based approaches (e.g., few-shot GPT, Qwen3) can adapt to novel templates but require labeled target examples to support robust prompting and entail prohibitive inference cost.
Hybrid uncertainty-routing strategies, which send "hard" (high-entropy) examples to LLMs and "easy" examples to the small model, lack true knowledge-level separation and rely on output uncertainty that is ill-calibrated in the zero-label setting.

These deficiencies motivate an explicit, domain-aware separation of log knowledge for robust cross-system anomaly detection.

2. GeneralLog Knowledge-Level Routing and System Architecture

GeneralLog introduces a semantic router that assigns unlabeled target logs to processing tracks reflecting their knowledge provenance:

Modular Structure

Log Parsing: The Drain algorithm extracts event templates from raw logs.
Semantic Embedding: Each log sequence $x_k$ is mapped to a dense vector sequence $\{v_1, \ldots, v_n\}$ in a shared embedding space.
Router: A training-free, event-level semantic router computes similarity between $x_k$ and all source log embeddings. For $v_i$ from $x_k$ and $u_j$ from $\mathcal{D}_s$ :

$\text{sim}_i = \max_{j} \cos(v_i, u_j),\quad \text{sim}(x_k) = \min_i \text{sim}_i$

Logs are classified as "general" if $\text{sim}(x_k) \geq \tau$ , "proprietary" otherwise.

Small Model (General): Handles logs with high semantic overlap; employs meta-learned, system-agnostic representations.
LLM+RAG (Proprietary): Qwen3 with a RAG knowledge base—uses the top- $k$ general log templates and their inferred labels from the small model as retrievals to contextualize inference.
Routing Controller: Computes a soft routing coefficient

$\alpha(x) = \sigma(\lambda(\text{sim}(x) - \tau)),$

where $\sigma(u)$ is sigmoid, $\tau$ is the router threshold, and $\lambda$ controls gating steepness.

Score Fusion: For each $x$ , combine small-model and LLM scores:

$S(x) = (1 - \alpha(x)) S_s(x) + \alpha(x) S_{LLM}(x)$

$S(x) > 0.5$ triggers an anomaly flag.

This explicit partition ensures that general logs—well-represented in the source—are inferred efficiently by the small model, while proprietary logs—unseen in the source—are handled with the semantic flexibility of LLM+RAG inference.

3. Model Components and Learning Objectives

Small Model (General Log Inference)

Architecture: GRU backbone with self-attention mask.
Heads: Feature extraction $f_{\theta_e}$ ; anomaly prediction $f_{\theta_\omega}$ ; domain discrimination $f_{\theta_d}$ .
Meta-learning with Adversarial Unsupervised Domain Adaptation (UDA):
- Define meta-tasks $M_i = \{M_i^{sup}, M_i^{que}\}$ mixing source and target, with labeled and unlabeled samples.
- Classification loss: $L_c^{MT_i} = BCE(f_{\theta_\omega}(\phi_s(X_{S_i})), Y_{S_i})$ .
- Domain adaptation loss: $L_{ad}^{MT_i} = BCE(f_{\theta_d}(\phi_s(X_{S_i})), 0) + BCE(f_{\theta_d}(\phi_s(X_{T_i})), 1)$ .
- Inner update: $\theta_e^i = \theta_e - \delta \nabla_{\theta_e} [\gamma L_c^{MT_i} - \beta L_{ad}^{MT_i}]$ with $\gamma, \beta$ weighting.
- Meta-optimization: $\theta_e \leftarrow \theta_e - \alpha \sum_i \nabla_{\theta_e} L_{MT_i}(M_i^{que}; \theta_e^i)$ .
- Inference: For general logs, $S_s(x) = f_{\theta_\omega}(\phi_s(x)) \in [0,1]$ .

LLM+RAG (Proprietary Log Inference)

Model: Qwen3 with Retrieval-Augmented Generation.
Knowledge Base: Top- $k$ general log templates from $\mathcal{D}_t$ and their labels (by small model).
Prompt: Provides labeled general-log templates and queries for anomaly status on proprietary logs.
Efficiency: Restricts expensive LLM calls to proprietary logs (typically ≤30–40% of all logs); RAG narrows retrieval for cost-containment.

A plausible implication is that, because the router operates in embedding space, its partitioning efficacy depends strongly on the quality of pre-trained or meta-learned representations.

4. Training, Routing, and Inference Workflow

The zero-label pipeline is as follows:

Preprocessing: Parse and embed logs in source ( $\mathcal{D}_s$ ) and target ( $\mathcal{D}_t$ ).
Meta-learned Small Model: Train on source data with UDA-based meta-learning involving pseudo-tasks mixing labeled and unlabeled samples.
RAG Knowledge Base Construction: Store general-log embeddings and their small-model predictions for efficient LLM retrieval.
Dynamic Routing:
- For each log $x$ , compute $\text{sim}(x)$ and $\alpha(x)$ .
- If $\text{sim}(x) \geq \tau$ , assign to small model for $S_s(x)$ ; else, query LLM+RAG for $S_{LLM}(x)$ .
- Fuse via $S(x)$ and apply anomaly threshold.
Hyperparameter Selection: Choose $\tau$ by inspecting the empirical distribution of $\text{sim}(x)$ over $\mathcal{D}_t$ (e.g., 60–80th percentile for thresholding); set $\lambda$ such that $\alpha(x)$ transitions steeply near $\tau$ . No supervision is needed for these selections—distributional heuristics suffice.

5. Empirical Results and Comparative Performance

Extensive experiments on three benchmark log datasets—HDFS (~11M events, ratio ≈ 9:1), BGL (~43M, 15:1), and OpenStack (Thunderbird, ~1.3M, 20:1)—demonstrate:

GeneralLog (zero-label) achieves F1-scores exceeding 90% across cross-system settings (HDFS→BGL, BGL→HDFS, OpenStack→HDFS, OpenStack→BGL).
Outperforms zero-label baselines:
- FreeLog: F1 ranges from 3.3 to 55.9 depending on transfer direction.
- RAGLog (LLM+KB): F1 ~89–91%.
- MetaLog (with 1% target labels): F1 ~89–92%.
Small-model accuracy on general logs remains $\gtrsim 95\%$ when the router threshold $\tau$ is set strictly; on proprietary logs, accuracy drops below 60%, confirming effective separation.
Ablation: Removing RAG from LLM inference degrades F1 by 5–8%; omitting meta-learning UDA lowers F1 by 4–6%.

Method	HDFS→BGL	BGL→HDFS	OpenStack→HDFS	OpenStack→BGL
FreeLog	55.9	3.3	8.0	48.2
RAGLog (with KB)	89.1	91.1	91.1	89.1
MetaLog (1% tgt)	92.2	89.6	89.6	92.2
GeneralLog	95.5	94.9	92.0	92.3

A plausible implication is that knowledge-level routing and RAG mitigate both domain-mismatch and efficiency obstacles that afflict prior methods.

6. Strengths, Limitations, and Future Directions

Strengths:

Achieves robust zero-label cross-system detection by explicit knowledge separation, outperforming both pure small-model and LLM-only baselines.
Maintains high cost-effectiveness by reserving expensive LLM inference for a fraction (≤40%) of logs.
F1 performance above 90% under full cold-start.

Limitations:

Router threshold $\tau$ is selected heuristically based on similarity distributions, lacking a statistical or learned criterion.
RAG knowledge base labels, if inaccurate, can propagate errors to LLM guidance.
Logs with proprietary patterns partially overlapping general logs may not receive optimal LLM context, especially if retrieval is too narrow.

Future research is suggested in:

Replacing heuristic routing with learned or contrastive classifiers to further automate $\tau$ selection.
Compressing LLM knowledge into the small model via distillation, further reducing inference cost.
Extending the approach to incorporate multi-modal proprietary knowledge (documentation, configs).
Enabling continuous adaptation: updating routing and RAG knowledge in an online fashion as logs from new systems accumulate.

GeneralLog formalizes a knowledge-level approach to routing and model selection in zero-label, cross-domain log anomaly detection, furnishing a comprehensive and empirically validated framework for scalable deployment in heterogeneous real-world software systems (Zhao et al., 8 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Generality Is Not Enough: Zero-Label Cross-System Log-Based Anomaly Detection via Knowledge-Level Collaboration (2025)

Follow Topic

Get notified by email when new papers are published related to GeneralLog.