StreamForest: Streaming Tree Ensembles

Updated 6 February 2026

StreamForest is a family of tree-based ensemble models for streaming data, enabling incremental learning and adaptation to concept drift.
They support diverse applications such as real-time video processing with event memory, online tabular classification, and scalable XML transformation.
The frameworks guarantee efficient, low-latency updates with bounded memory use and rapid convergence in distributed and evolving environments.

StreamForest refers to a family of algorithms and systems for processing, learning from, or transforming streaming data using structured “forest” models—ensembles of tree-based architectures that support efficient, incremental, or online operations. Applications span real-time distributed video understanding, tabular data stream classification, active learning in evolving streams, and streaming XML transformation. This article surveys the main StreamForest paradigms, including underlying algorithmic constructs, performance guarantees, drift adaptation mechanisms, and representative application domains.

1. Persistent Event Memory Forest for Streaming Video

StreamForest in modern multimodal LLMs (MLLMs) implements an event-based memory architecture enabling efficient real-time video understanding under tight computational constraints (Zeng et al., 29 Sep 2025). The core mechanism is the Persistent Event Memory Forest, a dynamic forest of tree structures where each node represents a temporally contiguous span of video frames aggregated into a semantically coherent event. Nodes store visual token sets, time stamps, and merge frequencies. New video frames are appended to a current-event leaf, and adjacent nodes are merged based on a learned penalty combining temporal distance, feature-space similarity, and merge frequency:

$C_\text{merge}(i,j) = \alpha P_t(i,j) + \beta P_s(i,j) + \gamma P_f(i,j)$

with $P_t(i,j)$ denoting absolute time difference, $P_s(i,j)$ the cosine feature dissimilarity, and $P_f(i,j)$ a normalized merge-history penalty ( $\alpha=0.2$ , $\beta=0.4$ , $\gamma=0.4$ in ablations). A split operation is triggered if a node’s visual token count or internal embedding variance exceeds pre-defined thresholds.

To ensure high-fidelity real-time perception, StreamForest integrates a fine-grained spatiotemporal window over the most recent $T_w$ frames at full spatial resolution. This allows queries to leverage both the compressed long-term event forest (for context and memory) and fine-detail short-term features, fused via a cross-attention module.

Empirically, StreamForest achieves state-of-the-art results on streaming benchmarks (StreamingBench: 77.26%, OVBench: 60.5%, OVO-Bench: 55.6% accuracy), outperforming both proprietary (GPT-4o, Gemini 1.5 pro) and open-source baselines, and exhibiting extreme robustness to memory compression (only −2.2 percentage points at 1024-token limits) (Zeng et al., 29 Sep 2025). Instruction tuning on the custom OnlineIT dataset further aligns model outputs to domain-specific streaming tasks and multi-turn queries.

2. Streaming Forest Learning for Tabular and Concept-Drift Data

In tabular data contexts, StreamForest frameworks refer to streaming extensions of decision forests designed for efficient learning under streaming and concept-drift conditions (Xu et al., 2021, Yuan et al., 2022, Luong et al., 2020).

The “Extremely Simple Streaming Forest” (XForest) paradigm (Xu et al., 2021) implements streaming forests by incrementally growing leaves on new batches, fixing previously established splits, and periodically retiring low-performing trees. Each incoming mini-batch is distributed across ensemble trees (bootstrap subsampled), and for each tree, only affected leaves and their buffers are considered for splitting. The XForest-Zero variant restricts updates to per-leaf class counters, freezing tree structure for zero-shot domain transfer.

Forgetful Forests (Yuan et al., 2022) combine sliding-window retention at the node or tree level, accuracy-driven dynamic adjustment of window size ( $r_\text{size}$ ), and probabilistic data expiration. Adaptation to drift is governed by an explicit retain-size growth function:

$r_\text{change} = (newAcc/lastAcc)^{\max(2, 3 - newAcc/lastAcc)}$

$rSize_\text{new} = \min(rSize_\text{old}\cdot r_\text{change} + iRate\cdot |X|, rSize_\text{old} + |X|)$

The framework attains up to 24× speedup over competitive incremental methods with negligible accuracy loss.

Streaming Deep Forest (SDF) (Luong et al., 2020) extends the gcForest cascade to an evolving-stream setting via multiple layers of adaptive random forests (ARFs), each equipped with Hoeffding–Naïve-Bayes Trees and ADWIN drift detectors. A budgeted active learning scheme (Augmented Variable Uncertainty, AVU) ensures full exploitation of labeling budgets $B > 0.5$ , while adaptively querying uncertain or random instances based on dynamic posterior thresholds.

Empirical evaluations across benchmark streams and real-world datasets show XForest and SDF consistently match or exceed the accuracy of more complex streaming ensemble approaches (e.g., Hoeffding Trees, Mondrian Forests), while using markedly less memory and offering faster per-sample updates (Xu et al., 2021, Yuan et al., 2022, Luong et al., 2020).

3. Distributed StreamForest Models for Peer-to-Peer Streaming

In real-time media distribution, StreamForest describes a decentralized, asynchronous, multiple-tree construction algorithm for peer-to-peer (P2P) video streaming (Zhu et al., 2013). The workflow divides a video into $M$ substreams (optionally using multiple-description coding), with each substream distributed along an independently managed directed tree rooted at a unique server node. The union of the $M$ trees over $N$ peers forms the StreamForest.

Topology maintenance is local, asynchronous, and decentralized: each peer independently samples other peers at exponentially-distributed times and executes a constant-time “CombinedUpdate” protocol. This protocol (GreedyTreeCover, SingleTreeAdjust, MixedNodeAdjust) rewires tree edges to maximize stream coverage and minimize delivery delay, using only local neighborhoods and buffered root-depth estimates.

Key theoretical guarantees (under instantaneous depth update and sufficient upload capacity) include:

Each peer receives at least $K$ distinct substreams, all trees are cycle-free, and tree depth is bounded as

$\text{depth}(E_i) \leq \log_2(N+1) + c$

for a constant $c$ determined by the initial topology (Zhu et al., 2013).

The system converges to optimal balanced trees in $O(\log N)$ time (w.h.p.), even under heterogeneity and in the presence of servers and passive clients.
Competition for upload capacity is handled via local adjustments, ensuring equilibrium where all trees are “as fat and as shallow as possible.”

Extensive simulations confirm the theoretical performance persists under relaxed assumptions, with rare cycles rapidly eliminated, exponential coverage growth, and robustness against stragglers through slight resource augmentation.

4. Streaming XML and Structured Document Transformation

“Streaming by forest transducers” applies StreamForest principles to real-time XML data transformation using Macro Forest Transducers (MFTs) (Hakuta et al., 2013). This approach compiles a practical XQuery fragment (MinXQuery) supporting element construction, XPath navigation (child, descendant, following-sibling), let- and for-statements, into MFTs capable of online transformation.

The MFT model operates over forests of unranked, labeled trees, with transformations described by state-transition rules parameterized by context variables. Execution is streamed via a pushdown stack and leverages static analysis (unused-parameter elimination, constant-parameter removal, stay-move inlining) and deforestation to reduce state complexity.

Empirical benchmarks indicate that the OCaml-based streaming engine achieves near-linear runtime and memory usage comparable to specialized C++ engines (GCX) across XMark and hand-crafted queries, even for large (100GB) inputs. The transducer approach enables richer expressiveness (e.g., let-statements, following-sibling) than prior streaming XML engines, with robust static guarantees and modular compositionality.

5. Comparative Computational Properties and Practical Guidelines

Representative StreamForest frameworks across domains demonstrate the following computational features:

Framework	Time Complexity	Space Complexity	Notable Features
XForest	$O(TpN\log(N/b)\log N)$	$O(dB + T N/b)$	Incremental growth, fast, bounded memory
Forgetful Forest	Empirically linear in $B$	$O(nTree \cdot rSize \cdot F)$	Adaptive forgetting, high throughput
Deep Forest (SDF)	$O(LFTD)$ per instance	$O(LF \cdot 2^D)$	Deep cascade, active learning, drift adaptation
P2P StreamForest	$O(\log N)$ convergence	Distributed in $\sum_u \bar{d}_u$	Probabilistic, decentralized
MFT Streaming	Linear in input size	$<5$ MB overhead	Parameter minimization, functional compilation
Video MLLM StreamForest	$O(1)$ per frame (window/forest ops)	Budgeted tokens	Event aggregation by learned penalty

Empirical guidance includes:

For XForest, batch sizes of 50–200 and forest sizes of 50–200 yield a balance of granularity and stability.
Forgetful Forests recommend initial $iRate=0.3$ , $nTree=20$ , and leverage bagging for moderate accuracy gains at the cost of throughput.
Cascaded deep forests with AVU excel where labeling budgets are tight and drift is anticipated.
P2P StreamForest protocols suit live video distribution across heterogeneous, large-scale peer networks.

6. Robustness, Adaptation, and Applications

StreamForest architectures show robust empirical behavior across concept drift, evolving workloads, and resource constraints:

Video MLLM StreamForest adapts memory compression by preserving semantically rich events using a learned penalty, with accuracy remaining above 96% of the baseline in severe regimes (Zeng et al., 29 Sep 2025).
XForest and Forgetful Forest degrade gracefully as batch sizes, window sizes, or drift severity vary, self-tuning retention without operator intervention (Yuan et al., 2022).
Streaming Deep Forest leverages ADWIN drift detection and background tree replacement for rapid adaptation, while AVU guarantees that full labeling budgets are utilized even under dynamic uncertainty (Luong et al., 2020).
Macro Forest Transducers, after parameter/stay-move reduction, guarantee streaming memory bounds on arbitrarily large XML documents, with correctness preserved by construction (Hakuta et al., 2013).

Use cases include high-volume real-time sensor analytics, distributed video streaming, autonomous driving perception, large-scale document ETL, and budget-constrained online learning. In streaming peer-to-peer and MLLM-video contexts, StreamForest enables both low-delay content delivery and efficient compressed long-term semantic memory, validated on diverse and large-scale benchmarks.