Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimum Flow Decomposition (MFD)

Updated 6 April 2026
  • Minimum Flow Decomposition (MFD) is a core combinatorial problem that reconstructs directed flows by decomposing them into a minimal set of integer-weighted s–t paths.
  • ILP formulations, enhanced with graph reduction, symmetry-breaking, and safety-based variable fixing, effectively address MFD's NP-hard challenges in both acyclic and cyclic settings.
  • MFD drives practical applications like RNA transcript assembly and viral quasispecies reconstruction, offering massive speedups and improved accuracy in complex networks.

The Minimum Flow Decomposition (MFD) problem is a central combinatorial optimization problem arising in computational biology, network analysis, transportation, and theoretical computer science. Given a directed network with a prescribed source-to-sink flow, MFD seeks a minimum-cardinality set of integer-weighted sstt paths whose weighted superposition exactly reconstructs the edgewise flows. Despite its classic formulation, MFD remains computationally intractable and rich in algorithmic, structural, and practical complexities that continue to motivate significant research.

1. Formal Definition and Mathematical Structure

Given a directed graph G=(V,E)G=(V,E) with unique source sVs\in V, sink tVt\in V, and a non-negative integer flow function f:EZ>0f:E\to\mathbb{Z}_{>0} satisfying flow conservation at each vV{s,t}v\in V\setminus\{s,t\},

(u,v)Efuv=(v,w)Efvw,\sum_{(u,v)\in E} f_{uv} = \sum_{(v,w)\in E} f_{vw},

the Minimum Flow Decomposition problem is to determine the smallest integer kk and a collection of sstt0 paths tt1 (with positive integer weights tt2), such that for every edge tt3,

tt4

MFD is strongly NP-hard on directed acyclic graphs (DAGs) and APX-hard in general; thus, polynomial-time algorithms are not expected to exist unless tt5 (Grigorjew et al., 2024, Cáceres et al., 2022).

Related decomposition variants include formulations for graphs with cycles (decomposing into walks, trails, or cycles), inexact flow decompositions (with edgewise lower/upper bounds), and robust decompositions under uncertainty in edge flows (Stinzendörfer et al., 2024, Sena et al., 24 Nov 2025, Dias et al., 2022).

2. Integer Linear Programming Models

The canonical algorithmic approach employs Integer Linear Programming (ILP). For a given upper bound tt6 and tt7 as above, variables tt8 indicate usage of edge tt9 by path G=(V,E)G=(V,E)0, and G=(V,E)G=(V,E)1 are the path weights. The constraints are:

  • Path-conservation: For all G=(V,E)G=(V,E)2, G=(V,E)G=(V,E)3,

G=(V,E)G=(V,E)4

  • Flow coverage: For every G=(V,E)G=(V,E)5,

G=(V,E)G=(V,E)6

In practice, bilinearities are linearized by introducing auxiliary variables and standard McCormick or big-G=(V,E)G=(V,E)7 constraints (Dias et al., 2023, Grigorjew et al., 2023, Dias et al., 2022).

For practical scalability, modern implementations incorporate safety-based variable fixing, antichain and excess-flow preprocessing, graph reduction (e.g., degree-1 contraction), weight-ordering symmetry-breaking, and restricted-weight ILP models. These techniques yield massive speedups, making exact ILP approaches feasible for G=(V,E)G=(V,E)8–G=(V,E)G=(V,E)9 on real-world assembly graphs (Grigorjew et al., 2023).

For graphs with cycles, ILP formulations become more involved, ensuring that each decomposition unit is constrained to be a simple path, a trail, or a walk, as required (Dias et al., 2022, Sena et al., 24 Nov 2025).

3. Structural Complexity and Graph Width Parameters

The tractability and approximability of MFD depend crucially on several graph parameters:

  • Width sVs\in V0: The minimum number of sVs\in V1–sVs\in V2 paths needed to cover all edges in sVs\in V3.
  • Parallel-width sVs\in V4: The maximum number sVs\in V5 such that a parallel bundle of sVs\in V6 sVs\in V7–sVs\in V8 edges is obtainable as a directed minor.
  • Flow-width sVs\in V9: The smallest integer tVt\in V0 such that the flow can be covered by tVt\in V1 tVt\in V2–tVt\in V3 paths, each edge appearing at most tVt\in V4 times.

Complexity results:

  • MFD is strongly NP-hard even for width tVt\in V5 and (weakly) NP-hard for width tVt\in V6 (Grigorjew et al., 2024).
  • For width-1 graphs (i.e., a single tVt\in V7–tVt\in V8 path), MFD is trivial.
  • For graphs of constant parallel-width and unary-coded flows, MFD is quasi-polynomial-time solvable, and FPT in tVt\in V9 with a double-exponential parameter dependence (Kloster et al., 2017, Grigorjew et al., 2024).
  • Width-stable graphs (those with monotone width under edge removal) admit bounded-approximation guarantees for greedy heuristics (Cáceres et al., 2022).

4. Safe Paths and Solution Invariance

A fundamental concern in bioinformatics applications (e.g., transcript assembly) is the existence of safe paths: subpaths that must appear as contiguous subpaths in every minimal solution. A path f:EZ>0f:E\to\mathbb{Z}_{>0}0 is safe if for all optimal decompositions, there is some f:EZ>0f:E\to\mathbb{Z}_{>0}1 with f:EZ>0f:E\to\mathbb{Z}_{>0}2. Determining the set of maximal safe paths for MFD is itself nontrivial.

The safety of paths is characterized—algorithmically and structurally—by ILP-based safety tests. In particular, group-testing ILP formulations can identify (in batch mode) the maximal set of candidate paths that are unavoidable in optimal decompositions, massively reducing computational time (Dias et al., 2023). In practice, this approach can recover up to 96% ground-truth transcript coverage in typical RNA-assembly graphs, with F-score improvements of 0.93–0.97 over previous safe path notions (see Table 1).

Method Coverage Precision F-score
SafeFlow 71–91% 100% 0.82–0.91
SafeMFD 88–96% 99–98% 0.93–0.97

Group-testing further halves the number of ILP calls required relative to naïve single-path testing, with empirical 3× speedup (Dias et al., 2023).

5. Algorithmic Approximability and Heuristics

MFD admits no polynomial-time constant-factor approximation in general (APX-hard), but parameterized approximations are possible by leveraging width parameters:

  • The classical parity-fixing (bit decomposition) scheme yields an f:EZ>0f:E\to\mathbb{Z}_{>0}3-factor approximation, where f:EZ>0f:E\to\mathbb{Z}_{>0}4 is the largest edge flow (Grigorjew et al., 2024, Cáceres et al., 2022).
  • Greedy-weight heuristics (iteratively extracting the widest f:EZ>0f:E\to\mathbb{Z}_{>0}5–f:EZ>0f:E\to\mathbb{Z}_{>0}6 path) are f:EZ>0f:E\to\mathbb{Z}_{>0}7-approximations on width-stable graphs, but performance may deteriorate (f:EZ>0f:E\to\mathbb{Z}_{>0}8 gap) on general DAGs (Cáceres et al., 2022).

The introduction of flow-width f:EZ>0f:E\to\mathbb{Z}_{>0}9 unifies and generalizes these guarantees: vV{s,t}v\in V\setminus\{s,t\}0, and all parameterized approximations are ultimately bounded by parallel-width and this structural parameter (Grigorjew et al., 2024).

6. Extensions: Graphs with Cycles and Robust MFD

Recent advances have generalized MFD to directed graphs with cycles, where decompositions may comprise walks, trails, or cycles. Exact ILP models for these cyclic variants enforce connectivity requirements (e.g., reachability, Miller–Tucker–Zemlin sequential ordering, SCC cut generation) and have proven practical for instances with up to vV{s,t}v\in V\setminus\{s,t\}1 decomposition units (Dias et al., 2022, Sena et al., 24 Nov 2025).

Dominator-tree techniques further identify "safe" sequences of edges common to all decompositions, yielding massive MILP reductions and up to 400× speedups (Sena et al., 24 Nov 2025).

Robust MFD variants address settings with uncertainty in edge flows (e.g., interval- or budgeted-uncertainty). Both static (worst-case) and adjustable (scenario-adaptive) robust optimization models have been developed. Adjustable robustness—differentiating between "here-and-now" and "wait-and-see" decisions—yields particularly compact and efficient models, reducing the required number of paths and weights by up to 70% versus naive approaches on real multi-scenario datasets (Stinzendörfer et al., 2024).

7. Practical Applications and Empirical Results

MFD underpins core bioinformatics tasks, most notably RNA transcript assembly and viral quasispecies reconstruction, where the flow decomposition encodes plausible molecular sequences from observed data. Large-scale benchmarks on splice graphs derived from human, mouse, and zebrafish transcriptomes demonstrate that ILP-based and safety-optimized methods can provide near-optimal path recovery, substantially improving true transcript coverage, typically within seconds to minutes per instance for graphs of practical size (vV{s,t}v\in V\setminus\{s,t\}2) (Dias et al., 2023, Grigorjew et al., 2023, Dias et al., 2022, Dias et al., 2022, Sena et al., 24 Nov 2025). The emergence of robust, adjustable MFD further enables practical handling of uncertainty in clinical genomics and transport planning (Stinzendörfer et al., 2024).

The field continues to advance along axes of algorithmic exactness, approximation, robustness, and practical optimization, with open challenges in constant-factor approximability, further width parameter refinements, and scalability to high-complexity multi-assembly and cyclic-graph settings.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimum Flow Decomposition (MFD).