Minimum Flow Decomposition (MFD)
- Minimum Flow Decomposition (MFD) is a core combinatorial problem that reconstructs directed flows by decomposing them into a minimal set of integer-weighted s–t paths.
- ILP formulations, enhanced with graph reduction, symmetry-breaking, and safety-based variable fixing, effectively address MFD's NP-hard challenges in both acyclic and cyclic settings.
- MFD drives practical applications like RNA transcript assembly and viral quasispecies reconstruction, offering massive speedups and improved accuracy in complex networks.
The Minimum Flow Decomposition (MFD) problem is a central combinatorial optimization problem arising in computational biology, network analysis, transportation, and theoretical computer science. Given a directed network with a prescribed source-to-sink flow, MFD seeks a minimum-cardinality set of integer-weighted – paths whose weighted superposition exactly reconstructs the edgewise flows. Despite its classic formulation, MFD remains computationally intractable and rich in algorithmic, structural, and practical complexities that continue to motivate significant research.
1. Formal Definition and Mathematical Structure
Given a directed graph with unique source , sink , and a non-negative integer flow function satisfying flow conservation at each ,
the Minimum Flow Decomposition problem is to determine the smallest integer and a collection of –0 paths 1 (with positive integer weights 2), such that for every edge 3,
4
MFD is strongly NP-hard on directed acyclic graphs (DAGs) and APX-hard in general; thus, polynomial-time algorithms are not expected to exist unless 5 (Grigorjew et al., 2024, Cáceres et al., 2022).
Related decomposition variants include formulations for graphs with cycles (decomposing into walks, trails, or cycles), inexact flow decompositions (with edgewise lower/upper bounds), and robust decompositions under uncertainty in edge flows (Stinzendörfer et al., 2024, Sena et al., 24 Nov 2025, Dias et al., 2022).
2. Integer Linear Programming Models
The canonical algorithmic approach employs Integer Linear Programming (ILP). For a given upper bound 6 and 7 as above, variables 8 indicate usage of edge 9 by path 0, and 1 are the path weights. The constraints are:
- Path-conservation: For all 2, 3,
4
- Flow coverage: For every 5,
6
In practice, bilinearities are linearized by introducing auxiliary variables and standard McCormick or big-7 constraints (Dias et al., 2023, Grigorjew et al., 2023, Dias et al., 2022).
For practical scalability, modern implementations incorporate safety-based variable fixing, antichain and excess-flow preprocessing, graph reduction (e.g., degree-1 contraction), weight-ordering symmetry-breaking, and restricted-weight ILP models. These techniques yield massive speedups, making exact ILP approaches feasible for 8–9 on real-world assembly graphs (Grigorjew et al., 2023).
For graphs with cycles, ILP formulations become more involved, ensuring that each decomposition unit is constrained to be a simple path, a trail, or a walk, as required (Dias et al., 2022, Sena et al., 24 Nov 2025).
3. Structural Complexity and Graph Width Parameters
The tractability and approximability of MFD depend crucially on several graph parameters:
- Width 0: The minimum number of 1–2 paths needed to cover all edges in 3.
- Parallel-width 4: The maximum number 5 such that a parallel bundle of 6 7–8 edges is obtainable as a directed minor.
- Flow-width 9: The smallest integer 0 such that the flow can be covered by 1 2–3 paths, each edge appearing at most 4 times.
Complexity results:
- MFD is strongly NP-hard even for width 5 and (weakly) NP-hard for width 6 (Grigorjew et al., 2024).
- For width-1 graphs (i.e., a single 7–8 path), MFD is trivial.
- For graphs of constant parallel-width and unary-coded flows, MFD is quasi-polynomial-time solvable, and FPT in 9 with a double-exponential parameter dependence (Kloster et al., 2017, Grigorjew et al., 2024).
- Width-stable graphs (those with monotone width under edge removal) admit bounded-approximation guarantees for greedy heuristics (Cáceres et al., 2022).
4. Safe Paths and Solution Invariance
A fundamental concern in bioinformatics applications (e.g., transcript assembly) is the existence of safe paths: subpaths that must appear as contiguous subpaths in every minimal solution. A path 0 is safe if for all optimal decompositions, there is some 1 with 2. Determining the set of maximal safe paths for MFD is itself nontrivial.
The safety of paths is characterized—algorithmically and structurally—by ILP-based safety tests. In particular, group-testing ILP formulations can identify (in batch mode) the maximal set of candidate paths that are unavoidable in optimal decompositions, massively reducing computational time (Dias et al., 2023). In practice, this approach can recover up to 96% ground-truth transcript coverage in typical RNA-assembly graphs, with F-score improvements of 0.93–0.97 over previous safe path notions (see Table 1).
| Method | Coverage | Precision | F-score |
|---|---|---|---|
| SafeFlow | 71–91% | 100% | 0.82–0.91 |
| SafeMFD | 88–96% | 99–98% | 0.93–0.97 |
Group-testing further halves the number of ILP calls required relative to naïve single-path testing, with empirical 3× speedup (Dias et al., 2023).
5. Algorithmic Approximability and Heuristics
MFD admits no polynomial-time constant-factor approximation in general (APX-hard), but parameterized approximations are possible by leveraging width parameters:
- The classical parity-fixing (bit decomposition) scheme yields an 3-factor approximation, where 4 is the largest edge flow (Grigorjew et al., 2024, Cáceres et al., 2022).
- Greedy-weight heuristics (iteratively extracting the widest 5–6 path) are 7-approximations on width-stable graphs, but performance may deteriorate (8 gap) on general DAGs (Cáceres et al., 2022).
The introduction of flow-width 9 unifies and generalizes these guarantees: 0, and all parameterized approximations are ultimately bounded by parallel-width and this structural parameter (Grigorjew et al., 2024).
6. Extensions: Graphs with Cycles and Robust MFD
Recent advances have generalized MFD to directed graphs with cycles, where decompositions may comprise walks, trails, or cycles. Exact ILP models for these cyclic variants enforce connectivity requirements (e.g., reachability, Miller–Tucker–Zemlin sequential ordering, SCC cut generation) and have proven practical for instances with up to 1 decomposition units (Dias et al., 2022, Sena et al., 24 Nov 2025).
Dominator-tree techniques further identify "safe" sequences of edges common to all decompositions, yielding massive MILP reductions and up to 400× speedups (Sena et al., 24 Nov 2025).
Robust MFD variants address settings with uncertainty in edge flows (e.g., interval- or budgeted-uncertainty). Both static (worst-case) and adjustable (scenario-adaptive) robust optimization models have been developed. Adjustable robustness—differentiating between "here-and-now" and "wait-and-see" decisions—yields particularly compact and efficient models, reducing the required number of paths and weights by up to 70% versus naive approaches on real multi-scenario datasets (Stinzendörfer et al., 2024).
7. Practical Applications and Empirical Results
MFD underpins core bioinformatics tasks, most notably RNA transcript assembly and viral quasispecies reconstruction, where the flow decomposition encodes plausible molecular sequences from observed data. Large-scale benchmarks on splice graphs derived from human, mouse, and zebrafish transcriptomes demonstrate that ILP-based and safety-optimized methods can provide near-optimal path recovery, substantially improving true transcript coverage, typically within seconds to minutes per instance for graphs of practical size (2) (Dias et al., 2023, Grigorjew et al., 2023, Dias et al., 2022, Dias et al., 2022, Sena et al., 24 Nov 2025). The emergence of robust, adjustable MFD further enables practical handling of uncertainty in clinical genomics and transport planning (Stinzendörfer et al., 2024).
The field continues to advance along axes of algorithmic exactness, approximation, robustness, and practical optimization, with open challenges in constant-factor approximability, further width parameter refinements, and scalability to high-complexity multi-assembly and cyclic-graph settings.