Online Learning for Dynamic Malware Detection

Updated 22 January 2026

The paper introduces online learning methods that incrementally update malware detection models to adapt to evolving threats in real-time.
It leverages high-dimensional process metrics and graph-based behavioral features to capture nuanced malware patterns and detect zero-day attacks.
The framework integrates adaptive algorithms and active learning for efficient drift handling, achieving significant improvements in accuracy and scalability.

Online learning approaches for dynamic malware detection define a paradigm where detection models incrementally adapt to evolving malicious activity by leveraging data streams and frequent model updates. Contrary to batch-learning protocols, which train static models on historic datasets, online methods exploit temporal structure, offer resilience to concept drift, and can detect zero-day or previously unseen malware behaviors. These frameworks utilize features sourced from process-level telemetry, graph-based behavioral representations, and active querying, and employ algorithms designed for efficient, real-time updates.

1. Feature Engineering and Streaming Data Representation

Behavioral features serve as primary inputs for dynamic malware detection. In frameworks such as "Towards Online Malware Detection using Process Resource Utilization Metrics" (Diamantopoulos et al., 15 Jan 2026), system-level process metrics sampled every 10 seconds are captured, including CPU (cpu_percent, cpu_num, cpu_sys, cpu_user, cpu_children_sys, cpu_children_user), memory (mem_shared, mem_data, mem_vms, mem_rss, mem_dirty, mem_swap, mem_lib, mem_uss, mem_text), disk I/O (io_write_bytes, io_read_bytes, io_write_chars, io_read_chars, io_write_count, io_read_count, ionice_ioclass, ionice_value), network activity (kb_sent, kb_received), and miscellaneous (num_threads, nice, ctx_switches_voluntary, ctx_switches_involuntary, gid_effective, status). Each system snapshot is flattened to a high-dimensional feature vector, zero-padded for variable process counts (up to 7,264 dimensions for M=227 processes).

Graph-based representations are utilized in Android malware detection frameworks such as DroidOL (&&&1&&&) and CASANDRA (Narayanan et al., 2017), wherein apps are converted to inter-procedural control-flow graphs (ICFGs) or contextual API dependency graphs (CADGs). Features are extracted using graph kernel relabeling (Weisfeiler–Lehman and CWLK) producing bag-of-subgraph vectors of high but sparse dimensionality. Dynamic graph learning, as in MG-DVD (Liu et al., 2021), models process interaction and resource usage through sliding-window heterogeneous graphs with multiple entity and relation types. This enables tracking of sophisticated behavioral patterns across evolving event streams.

2. Online Learning Algorithms and Update Rules

Key online algorithms include Adaptive Random Forest (ARF), Passive-Aggressive (PA), Confidence-Weighted (CW) classifiers, Hoeffding Trees, and dynamic graph encoders.

Adaptive Random Forest (ARF): As per (Diamantopoulos et al., 15 Jan 2026), ARF is an ensemble of Hoeffding trees, each incrementally updated in a test-then-train loop. Each new instance $x_t$ is resampled for each tree via $k_j \sim \text{Poisson}(1)$ , simulating online bagging. Tree weights $w_{j,t}$ are decayed or trees replaced on drift detection, providing implicit regularization and adaptation. The ensemble prediction is

$\hat{y}_t = 1\left\{ \sum_j w_{j,t} \cdot h_j(x_t) \geq \theta \right\}$

Passive-Aggressive (PA): Used in DroidOL (Narayanan et al., 2016) and ActDroid (Muzaffar et al., 2024). For instance $(x_t, y_t)$ , the PA-I update solves

$w_{t+1} = w_t + \tau_t y_t x_t, \quad \tau_t = \frac{\ell_t(w_t)}{\|x_t\|^2}, \quad \ell_t(w_t) = \max\{0, 1 - y_t (w_t \cdot x_t)\}$

Confidence-Weighted (CW): In CASANDRA (Narayanan et al., 2017), CW maintains a weight distribution $w \sim \mathcal{N}(\mu_t, \Sigma_t)$ and updates upon each sample to ensure prediction confidence. Closed-form updates adjust both mean and covariance per feature, preserving model certainty and enabling rapid adaptation to emergent feature patterns.
Dynamic Graph Encoders: MG-DVD (Liu et al., 2021) incrementally updates node embeddings only for changed nodes (set $\Delta V_t$ ). Two mechanisms, DWIUE and CHGAE, use attention-based aggregators across selected meta-graphs and hierarchical neighbor information, leading to efficient adaptation in large-scale streaming graphs.

3. Temporal Adaptation and Concept Drift Handling

Online malware detectors operate strictly in arrival order (no data shuffling), preserving temporal causality and enabling real-time adaptation. ARF and similar ensembles use adaptive drift detectors such as ADWIN, which monitor per-tree error rates over sliding windows and trigger replacement or retraining when analytic drift is detected:

$|\bar{W}_0 - \bar{W}_1| > \epsilon = \sqrt{\frac{1}{2m} \ln \frac{4}{\delta}}$

where $W_0$ , $W_1$ are splits of the error window and $m = \frac{n_0 n_1}{n_0 + n_1}$ . PA and CW algorithms update weights on each encountered sample, which naturally accommodates gradual drift and feature evolution. MG-DVD's embedding updates focus only on graph regions with topological changes, minimizing computational cost and drift lag.

Label availability is a limiting factor—Active online learning, as in ActDroid (Muzaffar et al., 2024), selects samples to query for labels based on classifier uncertainty and maintains budget constraints. This allows rapid focus on challenging or drifting instances while minimizing total annotation effort.

4. Experimental Evaluations and Quantitative Results

Empirical studies across these frameworks demonstrate robust performance relative to static batch models, particularly in long-term, non-stationary or zero-day settings.

Process-level metrics (ARF, (Diamantopoulos et al., 15 Jan 2026)):
- Batch RF (fixed): Acc ≈ 65%, Prec ≈ 60%, Rec ≈ 55%, F1 ≈ 57%
- Online ARF (adaptive): Acc ≈ 78%, Prec ≈ 75%, Rec ≈ 80%, F1 ≈ 77%
- Limited-label setting: Acc rises from ≈55% (no labels) to ≈78% (full labels), indicating robustness with sparse feedback.
Graph-based learning (DroidOL, (Narayanan et al., 2016)):
- PA-based online update: ≈84.3% overall accuracy; >3% gain over batch updates and >20% over single-shot batch learning.
- Dynamic vocabulary further improves accuracy.
Active learning (ActDroid, (Muzaffar et al., 2024)):
- Progressive (ideal) setting: ∼97% accuracy for static API features, ∼91% for dynamic features.
- Delayed labels: accuracy drops to ∼91%; active querying restores ∼96% with just 30% of samples labeled.
- Drift events tracked via ADWIN ensure up-to-date model state.
Dynamic graph learning (MG-DVD, (Liu et al., 2021)):
- MG-DVD: Recall 0.965, Precision 0.981, Acc 0.976, F1 0.973, AUC 0.952.
- Per-window detection times average ≈8.84s, twice as fast as previous graph-based methods.
Context-aware graph kernels (CASANDRA, (Narayanan et al., 2017)):
- BM F1-score: 99.23%; ITW accuracy: 89.92%
- Training ∼44× faster than Drebin and >1000× faster than CSBD.
- Feature weight tracking reveals rapid adaptation to changing malware landscape.

5. Computational Efficiency, Scalability, and Deployment

Online frameworks are optimized for per-sample update cost and minimal memory overhead. ARF and tree-based models scale with $O(T \cdot \log d)$ per timestamp, where $d$ is feature dimension; PA and CW algorithms operate in $O(\text{nnz}(x_t))$ , generally hundreds of nonzero entries per instance. MG-DVD updates only touched graph nodes ( $|\Delta V_t| \ll |V_t|$ ), yielding order-of-magnitude speedups over naive graph retraining.

Streaming environments require constant memory and avoid batch retraining; implementations such as CapyMOA for ARF (Diamantopoulos et al., 15 Jan 2026) can process thousands of samples per second with sub-10ms latency. Sparse feature representations and dynamic vocabulary growth (as in DroidOL (Narayanan et al., 2016) and CASANDRA (Narayanan et al., 2017)) keep memory usage minimal, even with feature dimensions exceeding $10^6$ .

6. Limitations and Prospective Research Directions

Observed constraints include:

Feature richness: Focusing on high-level behavioral metrics limits detection of stealthy or kernel-level attacks. Augmenting feature sets with system-call traces, memory dumps, or fine-grained execution events is anticipated to improve detection (Diamantopoulos et al., 15 Jan 2026, Muzaffar et al., 2024).
Drift detection: ARF's tree-wise drift detection is coarse; finer methods such as ensemble-level ADWIN or Bayesian change point detection are under consideration (Diamantopoulos et al., 15 Jan 2026, Muzaffar et al., 2024).
Classifier diversity: While PA and CW perform well, integration of deep-learning stream models (e.g., continual-learning RNNs, transformers) remains unexplored in the fully streaming context (Diamantopoulos et al., 15 Jan 2026).
Deployment aspects: Real-world integration with orchestrators (Kubernetes, IoT gateway) and runtime resource constraints need empirical evaluation.

A plausible implication is that combining interpretable online models, active querying, and dynamic graph learning in unified frameworks can realize scalable, adaptive malware detection for heterogeneous infrastructure. Evaluating deep neural online learners on streaming process-resource or behavioral graphs is a recommended avenue.

7. Interpretability and Explainability

Recent frameworks emphasize transparent decision rationale. CASANDRA (Narayanan et al., 2017) achieves explanation via inspection of top-weighted contextual subgraphs, exposing key behavioral triggers. MG-DVD (Liu et al., 2021) uses meta-graph attention weights to indicate which structured patterns contributed most to alerts, with per-family frequency matrices supporting forensic analysis. Feature importance tracking across time further clarifies adaptation to newly emerging or fading malware populations.

Each approach referenced above demonstrates that online learning protocols, tailored feature engineering, algorithmic adaptation to temporal structure, and efficient incremental updates substantively advance the detection of dynamic and evolving malware, supporting real-time security monitoring in modern computational environments.