Modular Stream Pipelines

Updated 20 March 2026

Modular stream pipelines are architectures that decompose data processing into isolated, composable modules with clear input/output schemas and resource envelopes.
They leverage formal operator models, DSLs, and combinator algebras to optimize resource allocation, reduce latency, and enable adaptive scaling across diverse deployment scenarios.
Applications include real-time IoT analytics, edge video-language processing, and federated stream reasoning, delivering measurable performance improvements and flexible, dynamic configurations.

A modular stream pipeline is an architecture that decomposes stream processing into composable, resource-isolated modules or micro-services linked by well-specified data-flow contracts. Each module processes, filters, fuses, or analyzes data streams with explicit input/output schemas, resource envelopes, and configuration parameters. Modular design enables adaptive scaling, extensibility, cross-domain deployment, and the capacity to target diverse contexts: real-time big data analytics, on-device AI, federated stream reasoning, and state-machine-level optimization.

1. Formal Operator and Module Models

Multiple frameworks model each stream-processing operator as a tuple encapsulating its logic, schemas, configuration, and resources. In H-STREAM (Vargas-Solar et al., 2021), each micro-service operator $O_i$ is defined as:

$O_i = \langle \text{Logic}_i, \text{In}_i, \text{Out}_i, \text{Config}_i, R_i \rangle$

$\text{Logic}_i$ : functionally, e.g. min, max, mean, filter.
$\text{In}_i$ , $\text{Out}_i$ : typed tuple streams.
$\text{Config}_i$ : windowing mode, provider, window size/slide, and periodicity.
$R_i=(\text{CPU}_i, \text{MEM}_i, \text{DISK}_i)$ : reserved resources.

Formal models in FMTK (Shastri et al., 30 Nov 2025) and Atom (Panchal et al., 18 Dec 2025) treat each module $M_i\colon X_{i-1} \to X_i$ as a pure, type-preserving transformer, with standardized input/output shapes enforced by pre-/post-process contracts. For example, in Atom, modules include persistent visual encoders, language decoders, indexers, and script generators, each implemented as an isolated, serializable artifact.

In the Strymonas library (Kiselyov et al., 2024) and stream-fusion approaches (Kiselyov et al., 2016), modularity extends beyond deployment granularity to the combinator algebra: pipelines are terms over map, filter, take, flatMap, zip, scan, and stateful sliding windows, each denoting transitions over coalgebraic state types $S_{A,Z}$ , systematically compiled to fusable, closure-free state machines.

2. Pipeline Composition and Composition Languages

Pipelines are composed as directed acyclic graphs (DAGs) whose nodes are operator instances or modules. The composition mechanism varies:

DSLs and APIs: H-STREAM exposes both a lightweight DSL (section/pseudogrammar) and a type-safe Java/Scala API, e.g., pipeline = fetch.union(hist).via(window).via(agg).to(sink) (Vargas-Solar et al., 2021). FMTK constructs pipelines as ordered module chains $P = M_k \circ\,\ldots\,\circ M_1$ with dynamic shape checks (Shastri et al., 30 Nov 2025). CQELS 2.0 enables declarative query composition (REGISTER ... AS STREAM ... WINDOW ...) or programmatic graph building, including explicit fusion operators (sequential $\otimes$ , parallel $\oplus$ ) (Le-Tuan et al., 2022).
Meta-layer and Extensible Composition: The CREEK system (Troyer et al., 2021) introduces two meta-layers: uCREEK $^\mathrm{c}$ (compile-time) and uCREEK $^\mathrm{r}$ (run-time), permitting users to intercept, rewrite, and morph pipelines at both instruction and live-message levels. Operator fusion, parallelism, pull/push semantics, and logging are implemented as meta-pipelines, not changes to core definitions.
Type and Shape Contracts: FMTK’s explicit preprocess/postprocess contracts, with runtime mini-batch validation, guarantee compatibility between modules across arbitrary encoder, backbone, adapter, and decoder chains (Shastri et al., 30 Nov 2025).
Functional Combinator Algebra: Stream-fusion and Strymonas approaches (Kiselyov et al., 2016, Kiselyov et al., 2024) realize modularity at the combinator level; arbitrary user terms over map/filter/flatMap/zip are normalized via equational rewrite systems, ensuring correctness and optimality of the composite state-machine.

3. Resource Allocation, Scaling, and Deployment Architectures

Modular stream pipelines permit fine-grained resource control and scalable deployment:

Micro-Service Deployment and Scheduling: In H-STREAM, each micro-service runs as a Docker container, integrated with Spark streaming for resource-managed execution. Auto-scaling rules compute the number of workers as $\#\text{nodes} \approx \lceil \lambda_\text{req,total} / \lambda_\text{node,max}\rceil$ , directly coupling resource allocation to data rates (Vargas-Solar et al., 2021).
Edge and Cross-Device Offloading: NNStreamer (Ham et al., 2022) enables GStreamer-style pipelines on edge devices, with simple offload mechanisms (swap tensor_filter for tensor_query_client/server or MQTT pub/sub), versioned bins, and discoverable modules. Among-device AI capacity is built via publish/subscribe, remote queries, and timestamp synchronization, supporting dynamic elasticity.
Federated Pipelines and Adaptive Scheduling: CQELS 2.0’s adaptive federator partitions global pipelines across edge and cloud, using node metadata for subquery assignment and dynamic stream-rate workload balancing (Le-Tuan et al., 2022). The federator algorithm minimizes communication and equalizes per-node computation $O(|D_i||Q_i|)$ for balanced scaling.
Concurrency and Overlap: Atom uses two thread-pools for encoding and decoding; as soon as encoder $E(x_i)$ produces $z_i$ , a decoder thread may process $z_i$ , yielding pipeline throughput $\approx \max(T_E,T_D)$ , versus $T_E+T_D$ without overlap (Panchal et al., 18 Dec 2025).

4. Optimization, Normalization, and Complete Fusion

Optimizing modular pipelines aims to minimize memory allocations, eliminate closures, and avoid intermediate buffers:

Automatic Fusion: Both stream-fusion (Kiselyov et al., 2016) and Strymonas (Kiselyov et al., 2024) employ normalization-by-evaluation, rewriting arbitrary user pipelines to a unique normal form representing a flat, imperative loop. Equational rewrite laws, such as

$\text{flatMap}\;f_1\;(\text{flatMap}\;f_2\;s) \;\cong\; \text{flatMap}\;(\lambda y.\,\text{flatMap}\;f_1\,(f_2\,y))\;s$

ensure that nested combinators are collapsed. The final stage emits hand-optimized code (absolutely no intermediate tuple or closure allocation) matching bespoke loops.

Meta-architecture/operator fusion: In CREEK, compile-time meta-layers identify and fuse consecutive map or filter operators, reducing per-operator overhead (Troyer et al., 2021).
Empirical Results: Microbenchmarks show that staged/fused pipelines match or exceed the speed of hand-written code (e.g., staged streams: 135 ms, handwritten loops: 130 ms for $N=10^8$ integer sum), vastly surpassing standard library streams (Java 8: 300 ms, Scala: 330 ms) (Kiselyov et al., 2016). Strymonas similarly reports performance on par with the best native code (Kiselyov et al., 2024).

5. Application Domains and Scenario-Specific Results

Modular stream pipelines support a broad range of application scenarios:

Real-Time IoT Aggregation: In H-STREAM, micro-service DAGs aggregate live and historic time-series at high input rates ( $800\times1$ msg/s), with sustained Spark throughput of $850$ msg/s, average end-to-end latency $220$ ms, and efficient resource utilization (CPU $65\%$ , MEM $55\%$ ) (Vargas-Solar et al., 2021).
Video-Language Processing at the Edge: Atom achieves $27$– $33\%$ end-to-end latency reductions (e.g., $162.3$ s $\to$ $108.7$ s on Pixel 5a), with negligible increases in RAM (+0.76%) and marginal drops in accuracy ( $\leq$ 2.3 Recall@1 in retrieval) via persistent module reuse and parallel execution (Panchal et al., 18 Dec 2025).
Time-Series Foundation Model Analytics: FMTK achieves $3$– $7\%$ runtime overhead compared to ad-hoc baselines, reproducing accuracy across diverse decoders and backbones, retaining shape/type safety, and enabling real-time windowed processing (Shastri et al., 30 Nov 2025).
Semantic Stream Reasoning and Fusion: CQELS 2.0’s modular rule-based pipelines, equipped with neuro-symbolic reasoning and federated deployment, sustain $1$ M quads/sec with $99\%$ window-freshness in cloud mode (8 Storm workers + HBase), and $30$ FPS, $35$ ms end-to-end edge-to-cloud latencies for streaming multi-object tracking (Le-Tuan et al., 2022).
Highly Extensible, Dynamic Pipelines: CREEK’s meta-stream architecture supports stacking of orthogonal capabilities (fusion, parallelism, logging, error-handling, push/pull switching); the tradeoff is a $3.3\times$ – $77\times$ slowdown compared to unfused bespoke pipelines, favoring semantic flexibility in complex settings (Troyer et al., 2021).

6. Modularity, Extensibility, and Deployment Patterns

Plug-and-Play Modularity: In tools like NNStreamer, each element is a dynamically loadable plugin; bins, sub-pipelines, and ghost pads provide hierarchical, reusable atomic units. Bins can be switched or versioned without service interruption (Ham et al., 2022).
Component Swappability and Adaptive Reconfiguration: Both FMTK and Atom enable modules—encoders, adapters, decoders—to be swapped at runtime or re-purposed with new configurations/adapters while preserving pipeline invariants, e.g., switching decoders from MLP to SVM without retraining the backbone (Shastri et al., 30 Nov 2025, Panchal et al., 18 Dec 2025).
Federation and Remote Elasticity: CQELS 2.0 and NNStreamer provide coordinated, cross-device orchestration. CQELS federator rebalances workloads among edge and cloud nodes to avoid local overload; NNStreamer leverages MQTT for discovery, failover, and flexible data routing among arbitrary end devices (Le-Tuan et al., 2022, Ham et al., 2022).
Formal Guarantees: Strong equational theories (in Strymonas (Kiselyov et al., 2024)) and type/shape contracts (in FMTK (Shastri et al., 30 Nov 2025)) ensure that modularity does not compromise performance or correctness: pipelines are statically or dynamically checked for compatibility, correctness of normalization, and existence of unique normal forms.

7. Performance Metrics and Scalability Models

Performance evaluation leverages quantifiable metrics:

Framework	Throughput	Latency	Resource/Cost	Extensibility
H-STREAM	$\sim$ 5kmsg/s/worker	150–300 ms	CPU/MEM/DISK per $\mu$ srv	DAG, rolling window
Atom	27–33% faster (Pixel 5a/8a/S23)	108.7 s (Atom)	Peak RAM +0.76%	Module-level
FMTK	$\sim$ 3% overhead	0.03–0.04 s/batch	+1%–7% peak memory	Hot-swappable
NNStreamer	$\sim$ 1.0 $\times$ native	Low–Medium video	$<$ 15% CPU loss (MQTT Hybrid)	Replace bins
CQELS 2.0	up to 1M quads/sec	35 ms (edge–cloud)	Modular, federated	Logic-fusion

For each, throughput is often the min of slowest module or network link; resource utilization is per-module and cumulative; modularity permits flexible scaling, replacement, and cross-infrastructure migration.

In summary, modular stream pipelines represent a rigorously formalized, empirically validated approach to scalable, extensible, and high-performance stream processing. By factoring computation into resource-isolated, type-preserving modules with precise composition, their design supports adaptive deployment, operator-level optimization, federation across edge/cloud, and dynamic reconfigurability—all while delivering performance competitive with or surpassing bespoke, monolithic alternatives (Vargas-Solar et al., 2021, Panchal et al., 18 Dec 2025, Kiselyov et al., 2016, Troyer et al., 2021, Kiselyov et al., 2024, Shastri et al., 30 Nov 2025, Ham et al., 2022, Le-Tuan et al., 2022).