MoFlow: Invertible Flow Models & Security

Updated 16 May 2026

MoFlow is a unified framework that employs flow-based transformations to generate molecular graphs, forecast human trajectories, and exploit memory vulnerabilities.
It utilizes invertible mappings through techniques like Glow-based flows and graph-conditional flows to achieve perfect reconstruction and state-of-the-art performance.
MoFlow demonstrates practical applications across generative modeling and cybersecurity while offering detailed methodologies, empirical results, and mitigation strategies.

MoFlow refers to multiple, domain-distinct methodologies unified by the use of flow-based transformations or flow-matching objectives for modeling generative processes or, in one context, for exploiting memory vulnerabilities. In academic literature, "MoFlow" most commonly designates (1) a normalizing-flow-based invertible generator for molecular graphs, (2) a conditional flow matching framework for multimodal human trajectory prediction, and (3) a buffer-overflow-based memory attack against ARM TrustZone. Each context harnesses the core principles of invertible mappings or flow-matching, but the implementation and objective are specialized to their respective problem domains.

1. MoFlow for Molecular Graph Generation

MoFlow (Zang et al., 2020) is an invertible flow-based generative model for molecular graphs that achieves one-shot, chemically valid molecule generation, tractable likelihood training, and 100% reconstruction of training data. The model factorizes molecular graph generation into two invertible decoupled stages:

Bond Generation via Glow-based Flow: The adjacency tensor $B$ encoding bond types is treated as an image with $c$ channels, leveraging Glow's actnorm, invertible $1\times1$ convolutions, and affine coupling layers to learn an invertible map $f_{\mathcal B}: B \leftrightarrow Z_B$ .
Atom Generation via Graph-Conditional Flow: Given $B$ , atom attributes $A$ (node features) are generated by a conditional flow, $f_{\mathcal A|\mathcal B}: A~|~B \leftrightarrow Z_{A|B}$ , using affine coupling layers where scale/translate functions are implemented as graph neural networks (R-GCNs) conditioned on the fixed adjacency.

The overall architecture enables both encoding and generation in a single deterministic pass. Exact log-likelihood is computed as

$\log p_X(X)=\log p_Z(f(X))+\sum_{l=1}^L \log|\det(\partial f_l/\partial H_{l-1})|,$

enabling maximum-likelihood training without variational approximations. At generation time, a post-hoc valency correction ensures all outputs are chemically valid by iteratively downgrading offending bonds to resolve valency violations. MoFlow achieves perfect validity, novelty, uniqueness, and 100% reconstruction on QM9 and ZINC250K benchmarks, outperforming prior flow or autoregressive baselines. A latent property regressor enables efficient molecular property optimization and constrained editing directly in latent space (Zang et al., 2020).

2. MoFlow for Human Trajectory Forecasting

MoFlow (Fu et al., 13 Mar 2025) addresses multimodal, socially-aware human trajectory prediction via conditional flow matching and IMLE-based distillation for efficient inference:

Teacher Network (Flow Matching): The model learns a time-dependent vector field $v_\theta$ so that the ODE $\frac{dY^t}{dt}=v_\theta(Y^t,C,t)$ transports noise $c$ 0 to true future trajectories $c$ 1. The decoder outputs $c$ 2 diverse scene-level trajectory candidates $c$ 3 per forward pass. Flow matching minimizes

$c$ 4

with multitarget “closest-prediction” to guarantee multimodal coverage: for a single ground-truth $c$ 5, train by regressing to the closest sample and classifying the correct mode.

Student Network (IMLE Distillation): To attain real-time inference, the teacher's output modes are distilled into a student $c$ 6 using Implicit Maximum Likelihood Estimation (IMLE), which finds for each teacher-generated set the closest student-generated set in Chamfer distance and updates $c$ 7 accordingly. This avoids adversarial instability and enables a one-shot sampler that is $c$ 8100 $c$ 9 faster than the ODE-based teacher, with comparable accuracy.

Empirically, MoFlow attains state-of-the-art results on NBA SportVU (0.71/0.87 minADE/minFDE), ETH-UCY (0.20/0.32), and SDD (7.50/11.96), with ablations confirming the necessity of noise sharing, flow-time masking, and K-mode positional encoding for diversity and stability (Fu et al., 13 Mar 2025).

3. MoFlow as a TrustZone Memory Attack

Distinct from generative modeling, "MOFlow" in (Sarker et al., 2023) denotes a buffer-overflow exploitation attack targeting ARM Cortex-M TrustZone's secure world. Here, the threat model assumes an attacker-controlled Trusted Application (TA) within the secure SRAM. Due to lack of hardware-enforced intra-secure-world isolation, a classic C buffer overflow in malicious TA $1\times1$ 0 can overwrite memory into an adjacent TA region $1\times1$ 1, exfiltrating data via NSC calls. The exploited vulnerability is formally characterized by computing the overflow offset

$1\times1$ 2

and the total bytes to leak an adjacent region as $1\times1$ 3.

A second vulnerability ("Achilles heel") arises from the absence of input pointer validation in veneer NSC entry hooks, allowing a non-secure caller to invoke, e.g., $1\times1$ 4, leaking arbitrary secure RAM contents. The paper proposes automatic pointer-range validation and fine-grained S-RAM region tracking as mitigations. Both vulnerabilities are demonstrated on ARM M23/M33 platforms (Sarker et al., 2023).

4. Flow-Based Model Methodology and Mathematical Framework

The flow-based models for generative learning in MoFlow (molecules) and MoFlow (trajectories) implement invertible mappings between a base noise distribution and complex data manifolds. In the molecular context, Glow-based and graph-conditional flows perform exact likelihood modeling via the change-of-variables formula. For trajectory prediction, conditional flow matching relies on optimal transport between noise and data via learned vector fields parameterized by deep networks.

The IMLE framework for distillation operates on the empirical distribution of teacher samples, optimizing the student so that every mode of the teacher is well-approximated without adversarial risk. Multimodality is explicitly operationalized in K-shot outputs and diversity-promoting losses.

For molecular graphs, the architecture enforces chemical validity both structurally and by explicit post-hoc repair. For trajectory prediction, social and physical plausibility are encoded by shared context representations and attention-based fusion of agent histories.

5. Empirical Results and Impact

MoFlow (molecular):

Achieves 100% validity, novelty, and uniqueness on both QM9 and ZINC250K with perfect reconstruction rates.
Outperforms GraphNVP, GCPN, MRNN, GraphAF, and JT-VAE for property optimization and constrained editing tasks.

Model	Validity (%)	Uniqueness (%)	Novelty (%)	Recon (%)
MoFlow	100	99.2	98.0	100

MoFlow (trajectories):

Delivers state-of-the-art or matching-state results against leading baselines such as LED and TUTR.
Demonstrates robust likelihood-error rank correlation, supporting reliable likelihood-based planning.

MoFlow (security):

Demonstrates successful S-RAM cross-app data leakage and S-world arbitrary memory disclosure on commercial Cortex-M processors, motivating hardware and software mitigation strategies (Sarker et al., 2023).

6. Limitations and Prospects

MoFlow (molecular) is limited by the dependence on known chemical building blocks and a post-hoc validity step; generation of entirely novel chemistry remains constrained by training data diversity. MoFlow (trajectory) relies on ODE integration for teacher training, and the quality of multimodal sampling is sensitive to K-selection and IMLE sample budget. MoFlow (memory attack) points to fundamental isolation deficiencies in TrustZone's architectural design, requiring mitigations that reduce C code vulnerability to memory corruption and stricter bounds enforcement.

Potential future directions include extending MoFlow-style flow-based approaches to larger or more heterogeneous graph structures, richer context encoding for trajectory modeling, integration of environment semantic information, adaptive multimodality selection, and generalization to novel tasks such as rigid-body flow matching for materials structure prediction, as exemplified by related models such as MOFFlow on $1\times1$ 5 (Kim et al., 2024).