PackFlow: Multi-Domain Flow Modeling
- PackFlow is a multi-domain framework that models flows—ranging from packet streams and atomic arrangements to network queues—as fundamental units for analysis.
- It employs specialized methods such as set-tree gradient boosting for rapid DDoS detection, ODE-based generative sampling for crystal structures, and stochastic ODE models for network performance.
- Empirical results show high detection accuracy with minimal overhead, improved energy predictions in crystals, and provably stable performance in network traffic scenarios.
PackFlow refers to multiple distinct frameworks in network modeling, data-driven DDoS detection, and molecular crystal structure prediction. The term captures modeling paradigms that view “flows” as fundamental composition units—either streams of network packets or atomic arrangements—while emphasizing structured data, optimal control, or generative learning. This entry presents the major PackFlow models spanning these domains, focusing on their mathematical structures, algorithmic methods, and empirical performance.
1. Stream-Structured DDoS Detection with PackFlow
PackFlow is a tree-based intrusion detection system that models each network flow as an ordered, variable-length stream of packet-header vectors, departing from fixed-size, statistics-aggregated records. Each flow is represented as a sequence of -dimensional vectors ( in experiments), capturing features such as directionality (Is_forward), protocol metadata (src_port, dst_port, protocol), position in stream (index), timing (ts, IAT variants), and TCP flags (SYN, ACK, RST) (Giryes et al., 2024).
Model Architecture: Set-Tree Decision Trees
Detection architecture employs the Set-Tree model, a gradient-boosted ensemble of decision trees of maximum depth 10. Nodes split using “set-compatible” tests:
where is a feature index, are exponents, and is a threshold. Special cases correspond to mean, sum, and harmonic mean splits. Each node computes an “attention set” and may focus splits recursively on these significant subsets. Learning seeks node splits that maximize Gini reduction, and the predictions are combined via gradient boosting. Attention history at each tree path is limited to the five most recent node outputs.
Training and Detection Protocol
The model is trained as a binary classifier (Benign vs. Attack) using logistic loss:
where is the true label. Early detection is evaluated by presenting the model with flow prefixes for various , without retraining separate models per prefix size.
Evaluation and Benchmarking
PackFlow is evaluated on CICDDoS2019 (50M attack, 0.1M benign flows, 248M packets) and CICIDS2017 (2.8M flows, 55.6M packets) datasets. Performance metrics include Recall, Precision, , Accuracy, time-saving (), and traffic volume overhead. Table 1 summarizes key empirical findings:
| Dataset | -pkts | Acc | TS (%) |
|---|---|---|---|
| CICDDoS2019 | full | 0.99980 | 0.0 |
| 2 | 0.99975 | 99.79 | |
| CICIDS2017 | full | 0.99651 | 0.0 |
| 2 | 0.80984 | 97.2 | |
| 4 | 0.96756 | 75.3 | |
| 14 | 0.99252 | 19.8 |
PackFlow matches or outperforms prior ML/DL methods (CyDDoS, DDoSNet, 1D-CNN+LSTM) in accuracy and recall, while supporting detection after only $2$ packets on median flows and operating with a raw header traffic overhead of (Giryes et al., 2024).
Feature Importance and Scaling
Feature contributions (Gini index) identify “length” and SYN/RST flag indicators as dominant DDoS cues; inter-packet arrival times and TCP window sizes are minor. Computational overhead of Set-Tree over standard trees is estimated at , with typical runtime of a few seconds on 80K flows.
2. PackFlow for Generative Molecular Crystal Structure Prediction
PackFlow is also the name of a generative framework for molecular crystal structure prediction (CSP), based on flow-matching generative models aligned with physics-informed reinforcement learning (RL). Given a molecular graph, PackFlow jointly samples all heavy-atom Cartesian coordinates within a unit cell and the six lattice parameters (Subramanian et al., 23 Feb 2026).
Generative Architecture and Flow Matching
Atoms are embedded as tokens
with noisy coordinates, is an RDKit feature vector, and is the lattice at annealing time . A Transformer encoder integrates covalent bond graph biases, and the network predicts vector fields for coordinates and lattice, enabling ODE-based sampling from a Gaussian prior.
Flow matching is achieved by interpolating between data and noise distributions for both coordinates and lattice, with a loss:
packflow samples are then deterministically generated via ODE integration from noise.
Physics Alignment via Reinforcement Learning
A post-training “physics alignment” (PA) step applies RL to encourage low-energy, clash-free structures, leveraging ML-interatomic-potential (MLIP) proxies:
- Rewards: negative lattice energy and average force norm, standardised and mixed via .
- Updates: policy improvement via Group Relative PPO (GRPO) loss with KL divergence, using flow-matching loss as a negative log-probability proxy.
This physically aligns the generative model without modifying the deterministic inference procedure, improving energy ranking and physical plausibility.
Integration in CSP Pipelines
PackFlow fits within standard CSP workflows:
- Heavy-atom proposal generation for a given composition.
- Hydrogen placement (RDKit 3D builder).
- H-only relaxation (frozen heavy atoms, MLIP relaxation).
- Full relaxation (all atoms, FIRE optimizer, MLIP).
- Lattice energy computation and ranking.
- Polymorph selection within target energy windows.
Pseudocode for the complete protocol is directly outlined in (Subramanian et al., 23 Feb 2026).
Empirical Performance
On 37K unseen crystals, PackFlow outperforms Genarris baselines with lower density error (4.7% vs. 27.9/19.8%), improved AMD (0.286 vs. 0.903/0.541), lower RDF Wasserstein distances, and comparable inference speed (0.13s/sample). Physics alignment further reduces clash rates and controls force/energy trade-offs continuously via parameter. In CSP blind tests, PackFlow finds candidates within a few kJ/mol of experimental polymorphs within 100 samples (Subramanian et al., 23 Feb 2026).
Implementation and Limitations
Key features include independent coordinate/lattice flows, dense bond-graph attention, canonical unwrapping, and reliance on MLIP for relaxation and reward calculation. The approach is so far limited to homomolecular crystals, does not enforce explicit space-group symmetry, and adopts an scaling with atom count due to the dense attention mechanism.
3. PackFlow in Stochastic Packet/Flow Network Modeling
In the context of network performance analysis, “PackFlow” (as Editor's term summarizing the framework in (Moallemi et al., 2010)) unifies packet-level queueing and flow-level utility-based control in a two-time-scale stochastic model of packet-switched networks.
Modeling Framework
- Nodes: finite set of directed links and end-to-end flow types .
- Queues: FIFO per in .
- Flows: Poisson flow-level arrivals , random packet generation, finite lifetime (mean packets per flow).
- Packets: Generated by flows and injected into ingress queues .
Scheduling and Dynamics
- Per-slot scheduling enforces constraints.
- Queue evolution:
with routing matrix , schedule count , idling process , and flow packet arrivals .
Congestion Control and Control Policy
- Flows maximize -fair utility under link constraints.
- Dual variables represent link shadow prices.
- At each slot, maximal weight scheduling is used, optimizing a weighted sum of queue lengths.
Fluid Limit and Stability
In the large-scale limit, packet/flow dynamics converge to a deterministic ODE system whose fixed points solve a convex program minimising a Lyapunov function . The framework is shown to be throughput optimal (positive recurrent under load constraints), to approach cost-optimality within a balance-factor, and to globally attract trajectories to the invariant manifold corresponding to optimal resource allocation.
4. Comparative Analysis and Empirical Results
PackFlow in DDoS detection and molecular crystal prediction both leverage stream or flow-structured data, but differ fundamentally:
| Domain | Input Structure | Learning Method | Evaluation Data | Empirical Strengths |
|---|---|---|---|---|
| DDoS Detection (Giryes et al., 2024) | Stream of packet headers | Set-Tree Gradient Boost | CICDDoS2019, CICIDS2017 | Very early accurate detection, low overhead |
| Molecular CSP (Subramanian et al., 23 Feb 2026) | Atom + lattice tensors | Flow Matching + RL | Unseen crystals, CSP blind tests | Physically-plausible samples, lower energy proposals |
| Packet/Flow Net Modeling (Moallemi et al., 2010) | Queues/flows (stochastic) | Stochastic/ODE modeling | Analytical, simulative | Proven throughput and stability guarantees |
PackFlow’s empirical results in their respective domains show either superior or equivalent performance to deep learning and heuristic contemporaries, with provable or measured gains in detection speed, candidate quality, and efficiency.
5. Limitations, Extensions, and Prospective Directions
PackFlow’s instantiations highlight several limitations:
- DDoS model coverage bounded by header field diversity and static tree designs.
- Molecular CSP variant currently restricted to homomolecular crystals, omitting explicit space-group symmetry and detailed thermodynamic modeling.
- Stochastic network model omits detailed traffic and protocol heterogeneities.
Potential future directions include extension to co-crystals, equivariant or symmetry-enforcing generative models, large-scale corpus training, sparse attention in atom-rich scenarios, and inclusion of explicit free-energy and kinetic effects in crystal structure ranking. In DDoS detection, incorporating deeper temporal or multi-modal representations and adapting model structures for higher traffic diversity are prospective avenues.
PackFlow, across its major incarnations, demonstrates the power of flow (stream) structure modeling for high-precision detection, efficient generative tasks, and mathematically rigorous network analysis (Giryes et al., 2024, Subramanian et al., 23 Feb 2026, Moallemi et al., 2010).