InferLog: Automated Log Inference

Updated 1 March 2026

InferLog is a comprehensive framework that automates log data analysis via LLM-accelerated parsing, structured template inference, and logic-based reasoning.
It employs innovative techniques such as Prefix-aware In-context Learning Refinement (PAIR) and meta-learning to achieve up to 69% latency reduction and 4× throughput improvements.
The framework extends to building system-level behavioral models and deductive reasoning in RDF/OWL setups, enabling scalable inference and enhanced semantic graph completion.

InferLog denotes a set of methodologies and systems for automated inference over log data, particularly focusing on the extraction of structured templates, efficient parsing via LLM inference optimization, system-level behavioral modeling from logs, and logic-based graph store reasoning. It encompasses both proactive template generation (e.g., from source code), rapid log parsing using LLMs, scalable model construction from distributed logs, and deductive entailment in logic-based triple stores. The term is used as both a product name for specific inference acceleration frameworks and as terminology in the literature for core concepts and abstractions.

1. LLM-Accelerated Log Parsing Optimization

InferLog, as introduced in "InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching" (Wang et al., 11 Jul 2025), is an LLM inference optimization method specifically designed for the high-throughput, low-latency demands of online log parsing in operational software systems. Unlike prior solutions that focus on reducing the number of calls to LLMs for template extraction, InferLog targets the microarchitecture of inference itself, identifying LLM prefill latency under high concurrency as the critical bottleneck.

The system comprises two principal innovations:

Prefix-aware In-Context Learning Refinement (PAIR): InferLog maximizes prefix-caching utility by dynamically rewriting and reordering demonstration logs in each ICL prompt so that large prompt prefixes are shared across requests. Demonstrations with matching templates (though distinct instantiations) are syntactically aligned to maximize cache hits in the underlying vLLM engine.
Meta-learning–based Configuration Optimization: InferLog adopts attention-augmented MAML (Model-Agnostic Meta-Learning) to learn a surrogate performance predictor that, combined with Sequential Model-Based Optimization (SMBO), enables rapid discovery of optimal LLM inference configurations (batch size, scheduling delays, cache settings) under evolving workload characteristics.

These strategies yield up to 69% reductions in p95 inference latency and up to 4× throughput improvements, with accuracy being invariant compared to front-end parsers alone. PAIR specifically increases prefix-cache hit rates (up to 80%)—a key enabler of high concurrency log processing. Experiments show similar benefits across multiple LLM backbones (Qwen2.5-14B, LLaMA 3-8B, Mixtral-7B), and when stacked atop parsers such as DivLog, LILAC, and LogBatcher, reflecting its composability and independence from template-extraction strategies.

2. Proactive Log Template Inference from Source and Raw Logs

Within the LLM-SrcLog framework (Sun et al., 4 Dec 2025), the InferLog pipeline refers to the proactive extraction of structured templates by integrating program analysis and LLM-driven reasoning. It addresses key limitations of reactive, log-centric parsing by exploiting both static code and data-driven techniques.

The core InferLog pipeline consists of:

Cross-Function Static Code Analysis (SCA): An interprocedural, path-sensitive analyzer reconstructs all feasible runtime string shapes for each logging invocation. Abstract syntax tree parsing, call graph construction, and recursive data-flow analysis across return and call sites enumerate the set of candidate string templates per log call, replacing dynamically computed fragments with the wildcard marker <.*>.
LLM-Based White-Box Template Extraction (WTE): The code context and SCA output are fed to a LLM instructed to normalize, distinguish, and emissionize log templates in strict JSON. Output templates are filtered by rules and semantic verification.
Data-Driven Black-Box Template Extraction (BTE): For logs where code is unavailable or too dynamic, InferLog applies Drain3, which clusters logs into templates using a fixed-depth tree, where templates are defined (formally) as sequences with wildcards at variable positions: message tokens $[t_1, t_2,\dots,t_n]$ are grouped by equality at constant positions.

This offline-then-online approach achieves F1 improvement of 2–17% (vs. LLM baselines) and reduces latency by $\sim 1\,000\times$ relative to per-log LLM inference. Experimental validations on Hadoop, Zookeeper, Sunfire-Compute, and production Alibaba logs demonstrate robust coverage and practical utility.

3. Inference of System-Level Models from Distributed Component Logs

A distinct usage of InferLog appears in scalable model inference, as implemented in the SCALER approach (Shin et al., 2019). Here, InferLog refers to the pipeline of building a behavioral gFSM model for an entire component-based system based solely on component-local logs, a partial communication template list, and coarse architectural dependency information.

The SCALER InferLog process follows a two-stage divide-and-conquer methodology:

Per-component Model Inference: Each component's logs are converted (using e.g., MINT) into a local gFSM, consistent with observed sequences.
Dependency Extraction and Merging: Cross-log dependencies (e.g., communication events) are inferred heuristically using timestamp proximity and the partial template list. The system-level gFSM is constructed via recursive composition: per-execution traces are "stitched" by grafting dependent component segments in correspondence with observed leads-to relations.

This approach scales as $O(N\cdot m)$ in dependency extraction (where $N$ is total log entries, $m$ is number of components), and up to $400\times$ speedup over monolithic model induction was demonstrated on datasets up to $35\,000$ entries. Recall improvements of $\sim 35$ percentage points over the baseline are typical, with specificity only marginally reduced.

4. Deductive RDF/OWL Reasoning via Inference Rules ("InferLog" Engine)

InferLog also designates an N3Logic-based rule engine for RDF(S)/OWL entailment (Tomaszuk, 2016). In this context, "InferLog" formalizes a suite of inference rules specified over Notation3 (N3) triples for deductive closure in semantic web graph stores:

Syntax and Rule Structure: N3Logic semantics use universally quantified variables, explicit formulas (curly-bracketed triples), and logical implication ( $\Rightarrow$ ).
RDF(S) Inference Rules: Canonical rules (e.g., domain, range, subclass, subproperty, symmetry, transitivity, functional/inverse-functional property collapse) are formalized as Horn patterns, yielding new triples or inconsistencies.
OWL-P Profile: A lightweight OWL subset (OWL-P) is specified for tractable reasoning, including only the most common property/class constructs. Inference covers, for instance, symmetric, transitive, and functional-object properties, disjoint classes, and propagation of equivalence.
Reasoning Process: Forward chaining over the rules produces the saturated closure of all entailments within RDF(S)/OWL-P. The system is sound and complete for all entailments permitted by these profiles.

Applications include semantic search, ontology-based data access, and knowledge graph completion in Linked Open Data environments.

5. Evaluation, Metrics, and Empirical Results

The efficiency and accuracy of InferLog methodologies are quantified in multiple domains:

LLM-Accelerated Parsing (Wang et al., 11 Jul 2025):
- Latency (p95): InferLog achieves 3.2 s (vs. 10.3 s for no prefix-caching baseline).
- Throughput: 240 req/s (vs. 60 req/s for default vLLM).
- Cache hit rate: 80%.
- Parsing accuracy: Maintained at ~96%.
Proactive Template Inference (Sun et al., 4 Dec 2025):

| Method | Hadoop F1 | ZooKeeper F1 | Sunfire F1 | |------------|-----------|--------------|------------| | DivLog | 65% | 87% | 76% | | LLMLog | 61% | 77% | 37% | | LLM-SrcLog | 71% | 92% | 100% |

Case Study Outputs: In Alibaba production, high-frequency templates directly mapped to fault domains, enabling rapid root cause localization.
- System Model Inference (Shin et al., 2019):
Recall: Increased by $\sim35$ percentage points over monolithic MINT.
Runtime: Up to $\sim 1\,000\times$ $\sim 1000 \times$ 0 faster.
- Deductive Reasoning (Tomaszuk, 2016):
Completeness: All RDFS/OWL-P entailments are deduced in polynomial time.

6. Limitations and Future Directions

Known limitations include:

LLM Inference Optimization: PAIR assumes demo order is not critical, which can be violated for tasks like multi-document QA. Meta-learning effectiveness depends on workload stationarity.
Template Extraction: Black-box template clustering may over-generalize on rare logs, occasionally requiring human curation.
System Model Inference: Coarse timestamp resolution or high message interleaving may yield incorrect dependency extraction.
Deductive Reasoning: OWL-P's tractability omits complex class expressions, cardinality, and property chains.

Ongoing and future research directions involve PAIR generalization to other ICL-heavy tasks, dynamic policy adaptation under log distribution drift, probabilistic dependency extraction in model inference, and extension to richer reasoning constructs in deductive graph stores. The interoperability of InferLog frameworks with broader AIOps, anomaly detection, and knowledge management ecosystems remains an active area of system development and empirical benchmarking.