Match Graph: Algorithms & Applications

Updated 4 July 2026

Match graph is a family of problems focused on identifying structure-preserving correspondences between graphs under one-to-one, subgraph, and wildcard constraints.
Techniques include quadratic assignment formulations for node and edge compatibilities, subgraph isomorphism, and methods such as FRGM and ATGM for efficient matching.
Advanced approaches leverage iterative refinement, spectral methods, and distributed optimization to address high-order, attribute-rich, and deformation-aware matching challenges.

Searching arXiv for recent and foundational graph matching papers to ground the article. Match graph denotes the family of problems in which one seeks structure-preserving correspondences between graphs. In the most common one-to-one setting, the task is to find a binary correspondence matrix $X$ or $P$ that maps vertices of one graph to vertices of another under injectivity or permutation constraints; in query settings, the task is to find all subgraphs of a data graph that are isomorphic to a query graph. Across these settings, graph matching is typically formulated either as a quadratic assignment problem (QAP) combining unary node compatibility and pairwise edge compatibility, or as a subgraph-isomorphism, pattern-matching, or graph-editing problem with additional structural constraints (0806.2890, Rivero et al., 2013, Schaller et al., 2021).

1. Problem formulations and variants

A standard formulation begins with two attributed graphs, for example $G^{(1)}=(V^{(1)},E^{(1)})$ and $G^{(2)}=(V^{(2)},E^{(2)})$ , and a binary matrix $X\in\{0,1\}^{n\times n}$ or $P\in\{0,1\}^{m\times n}$ encoding correspondence. The usual one-to-one constraints are row- and column-wise inequalities or equalities such as $\sum_i X_{ii'}\le 1$ and $\sum_{i'}X_{ii'}\le 1$ , with many-to-one variants enforcing exactly one match per node in the smaller graph. In the attributed setting, the standard QAP objective maximizes a linear term over node compatibilities $c_{ii'}$ and a quadratic term over edge compatibilities $d_{ii'jj'}$ (0806.2890).

Subgraph matching uses a different but related notion of correspondence. For a data graph $P$ 0 and a query graph $P$ 1, the problem is to find all injective maps $P$ 2 such that every query edge is preserved in the data graph, that is, $P$ 3. Rivero and Jamil’s graphlet-based formulation stores each graph as a set of one-hop induced neighborhoods and matches a query graph by recursively unifying grounded query graphlets with data graphlets (Rivero et al., 2013).

Regular Graph Patterns (ReGaPs) further generalize graph isomorphism. A ReGaP is a directed graph in which some vertices are ordinary and some are wildcards representing arbitrary-length sequences or arbitrary-size subgraphs. The wildcard types $P$ 4 denote any- $P$ 5-sequence, any- $P$ 6-sequence, any- $P$ 7-subgraph, and any- $P$ 8-subgraph, respectively. This shifts the matching problem from strict equality to declarative structural specification (Terra-Neves et al., 2023).

A distinct specialization is the best match graph (BMG). In phylogenetics, a BMG is induced by a leaf-colored phylogenetic tree $P$ 9: a leaf $G^{(1)}=(V^{(1)},E^{(1)})$ 0 is a best match of $G^{(1)}=(V^{(1)},E^{(1)})$ 1 for color $G^{(1)}=(V^{(1)},E^{(1)})$ 2 when $G^{(1)}=(V^{(1)},E^{(1)})$ 3 is minimal among leaves of color $G^{(1)}=(V^{(1)},E^{(1)})$ 4. The resulting colored digraph $G^{(1)}=(V^{(1)},E^{(1)})$ 5 contains an arc $G^{(1)}=(V^{(1)},E^{(1)})$ 6 exactly when $G^{(1)}=(V^{(1)},E^{(1)})$ 7 is a best match of $G^{(1)}=(V^{(1)},E^{(1)})$ 8 (Schaller et al., 2021).

Variant	Matching object	Stated criterion
One-to-one graph matching	$G^{(1)}=(V^{(1)},E^{(1)})$ 9 or $G^{(2)}=(V^{(2)},E^{(2)})$ 0	Node and edge compatibilities under injective or permutation constraints
Subgraph matching	$G^{(2)}=(V^{(2)},E^{(2)})$ 1	Injective edge-preserving map
ReGaP matching	Pattern-to-graph bijection after wildcard generalization	Adjacency preservation with wildcard-type consistency
BMG editing	Edit set $G^{(2)}=(V^{(2)},E^{(2)})$ 2 on arcs	Transform a colored digraph into an $G^{(2)}=(V^{(2)},E^{(2)})$ 3-BMG

2. Objective functions and representations

The classical QAP viewpoint vectorizes the correspondence matrix and optimizes an objective of the form $G^{(2)}=(V^{(2)},E^{(2)})$ 4, or equivalently $G^{(2)}=(V^{(2)},E^{(2)})$ 5. A major computational consequence is that the affinity matrix $G^{(2)}=(V^{(2)},E^{(2)})$ 6 has size $G^{(2)}=(V^{(2)},E^{(2)})$ 7, which yields $G^{(2)}=(V^{(2)},E^{(2)})$ 8 memory in standard pairwise formulations (0806.2890, Wang et al., 2018).

“A Functional Representation for Graph Matching” reformulates this by representing a graph $G^{(2)}=(V^{(2)},E^{(2)})$ 9 as a linear function space $X\in\{0,1\}^{n\times n}$ 0 with a graph-compatible inner product or Wasserstein metric induced by the edge-attribute matrix $X\in\{0,1\}^{n\times n}$ 1. A correspondence $X\in\{0,1\}^{n\times n}$ 2 then induces a linear map $X\in\{0,1\}^{n\times n}$ 3 on functions, with matrix representation $X\in\{0,1\}^{n\times n}$ 4. In this formulation, pairwise scores are encoded through the smaller edge-attribute matrices $X\in\{0,1\}^{n\times n}$ 5 and $X\in\{0,1\}^{n\times n}$ 6, reducing space from $X\in\{0,1\}^{n\times n}$ 7 to $X\in\{0,1\}^{n\times n}$ 8. FRGM further distinguishes a general matching objective $X\in\{0,1\}^{n\times n}$ 9, an interpolation/refinement objective $P\in\{0,1\}^{m\times n}$ 0, a Euclidean edge-length-preserving objective $P\in\{0,1\}^{m\times n}$ 1, and a deformation-aware objective $P\in\{0,1\}^{m\times n}$ 2 (Wang et al., 2019).

ATGM adopts a related but more explicitly geometric transformation view. Given node sets $P\in\{0,1\}^{m\times n}$ 3 and $P\in\{0,1\}^{m\times n}$ 4, it models

$P\in\{0,1\}^{m\times n}$ 5

The first functional $P\in\{0,1\}^{m\times n}$ 6 preserves pairwise structure by penalizing mismatch between original edge lengths $P\in\{0,1\}^{m\times n}$ 7 and transformed edge lengths $P\in\{0,1\}^{m\times n}$ 8. The second functional $P\in\{0,1\}^{m\times n}$ 9 reduces residual shifts, includes an $\sum_i X_{ii'}\le 1$ 0 sparsity term $\sum_i X_{ii'}\le 1$ 1, and adds a unary distance term $\sum_i X_{ii'}\le 1$ 2. Because pairwise edge attributes are represented by unary node attributes after the linear transformation, ATGM stores only $\sum_i X_{ii'}\le 1$ 3 data for $\sum_i X_{ii'}\le 1$ 4 and $\sum_i X_{ii'}\le 1$ 5, plus $\sum_i X_{ii'}\le 1$ 6 for graph Laplacians (Wang et al., 2018).

GASM, or Graph Attributes and Structure Matching, integrates structure and attributes through a score-propagation mechanism rather than an explicit QAP affinity tensor. It constructs a vertex-score matrix $\sum_i X_{ii'}\le 1$ 7, initializes

$\sum_i X_{ii'}\le 1$ 8

and then alternates

$\sum_i X_{ii'}\le 1$ 9

Here $\sum_{i'}X_{ii'}\le 1$ 0 is the Hadamard product of vertex-attribute similarity matrices, $\sum_{i'}X_{ii'}\le 1$ 1 is an edge-similarity matrix, $\sum_{i'}X_{ii'}\le 1$ 2 are incidence matrices, and $\sum_{i'}X_{ii'}\le 1$ 3 is a tiny random noise matrix of amplitude $\sum_{i'}X_{ii'}\le 1$ 4. The final discrete matching is obtained by solving a linear assignment problem (LAP) on the converged $\sum_{i'}X_{ii'}\le 1$ 5 (Candelier, 2024).

3. Optimization and inference algorithms

Frank–Wolfe methods occupy a central position in several modern formulations. Both ATGM and FRGM relax the discrete correspondence to a convex polytope such as

$\sum_{i'}X_{ii'}\le 1$ 6

or an analogous doubly-stochastic set. At each iteration, they solve a linearized subproblem

$\sum_{i'}X_{ii'}\le 1$ 7

which is a linear assignment problem solved by Hungarian or Jonker–Volgenant in $\sum_{i'}X_{ii'}\le 1$ 8, followed by a line search and convex update. In ATGM, the convex refinement objective $\sum_{i'}X_{ii'}\le 1$ 9 has at least sublinear $c_{ii'}$ 0 convergence under Frank–Wolfe; the paper reports that $c_{ii'}$ 1 converges in $c_{ii'}$ 2 iterations to $c_{ii'}$ 3 tolerance, while about 200 iterations are used on the nonconvex $c_{ii'}$ 4 to initialize $c_{ii'}$ 5 (Wang et al., 2018). FRGM also introduces an entropy-smoothed approximated Frank–Wolfe variant solved by Sinkhorn in $c_{ii'}$ 6 per iteration (Wang et al., 2019).

Caetano et al. shift emphasis from approximate QAP solving to learning the compatibility functions themselves. In “Learning Graph Matching,” node compatibilities and pairwise compatibilities are parameterized as

$c_{ii'}$ 7

and learned with a structured SVM. Because the training problem contains exponentially many constraints, the method uses column generation to find the most violated constraint by repeatedly solving a relaxed QAP, and BMRM solves the resulting reduced convex program. At test time, inference is again a graph matching problem with learned compatibilities; if $c_{ii'}$ 8, inference becomes a linear assignment solved in $c_{ii'}$ 9, whereas otherwise the paper uses Graduated Assignment with Sinkhorn normalization (0806.2890).

The iGraphMatch package systematizes several inference families. Its relaxation-based methods implement Frank–Wolfe for the indefinite objective $d_{ii'jj'}$ 0 and the convex objective $d_{ii'jj'}$ 1, and PATH interpolates between convex and concave relaxations through an annealing parameter $d_{ii'jj'}$ 2. Its percolation-based algorithms propagate partial matchings from seeds using a mark matrix $d_{ii'jj'}$ 3. Its spectral methods include IsoRank, based on

$d_{ii'jj'}$ 4

and Umeyama’s eigenbasis alignment followed by a LAP (Qiao et al., 2021).

Distributed optimization provides another route when the data are decentralized. In the distributed convex relaxation studied for multi-agent networks, each agent knows only one column $d_{ii'jj'}$ 5 of $d_{ii'jj'}$ 6 and one column $d_{ii'jj'}$ 7 of $d_{ii'jj'}$ 8, maintains local variables $d_{ii'jj'}$ 9 and dual variables $P$ 00, and exchanges only neighbor information over a connected communication graph. The resulting projected primal–dual gradient dynamics converge globally and exponentially to the unique permutation $P$ 01 when the input graphs are undirected, connected, isomorphic, and asymmetric (Tran et al., 2020).

4. High-order, attribute-rich, and deformation-aware extensions

Higher-order structure can be encoded explicitly through iterated line graphs (ILGs). HGMN constructs $P$ 02, runs a shared GNN $P$ 03 on the $P$ 04-ILGs and a second shared GNN $P$ 05 on the original graphs, and combines the resulting similarities by

$P$ 06

The paper states two expressivity results: a single layer of GCN on $P$ 07 is equivalently expressive as a GNN on its line graph $P$ 08, whereas iterated line-graph construction up to order $P$ 09 is strictly more expressive than $P$ 10-layer GCNs for distinguishing node roles and aligning across graphs, because ILGs explicitly encode relations among hyperedges (Xu et al., 2020).

Attribute integration is handled differently in GASM. Vertex and edge attributes are incorporated before structural propagation, and the relative trust placed in attributes is tuned by noise or error parameters $P$ 11. The method also uses a tiny random symmetry-breaking perturbation $P$ 12, a normalization heuristic $P$ 13, an ad hoc convergence criterion $P$ 14 based on graph diameter, explicit restoration of isolated-vertex scores, and a GPU implementation (Candelier, 2024).

Geometric deformation is central in FRGM. Its Euclidean formulation parameterizes the transformed node set as $P$ 15 and introduces auxiliary transforms $P$ 16 for similarity, affine, and RBF-nonrigid deformation. The paper states that $P$ 17 and $P$ 18 can then be estimated simultaneously, with closed-form updates for $P$ 19 through least squares and SVD under graph-Laplacian regularization (Wang et al., 2019).

ATGM also belongs to this geometry-aware class but emphasizes domain adaptation and outlier suppression. After obtaining $P$ 20, it alternates between minimizing $P$ 21, computing nearest-neighbor distances from transformed source points to target points, pruning target vertices by a ratio test $P$ 22, and re-solving $P$ 23. The stated rationale is that inlier points in $P$ 24 lie close to $P$ 25, whereas outliers are far (Wang et al., 2018).

5. Subgraph, pattern, and specialized matching systems

Rivero and Jamil’s vertex-at-a-time algorithm organizes subgraph matching around graphlets and minimum hub covers. A graphlet $P$ 26 is the induced one-hop neighborhood of $P$ 27, including internal edges among the neighbors, and a minimum hub cover is a smallest set of query vertices whose incident-edge sets cover all query edges. Matching proceeds by recursively grounding one query graphlet at a time, unifying it with all compatible data graphlets, checking consistency with the current partial map, and pruning when the unification set is empty. The paper gives worst-case unification cost $P$ 28 for a graphlet with $P$ 29 neighbors, worst-case recursion $P$ 30, and, in its motivating experiment, reports that the heuristic minimum hub cover $P$ 31 explored 189 partial solutions versus 309 for the best full-ordering, with memory peak under 50 MB (Rivero et al., 2013).

SAT-based matching for ReGaPs reduces wildcard-rich declarative pattern matching to CNF satisfiability. The encoding uses variables $P$ 32 for node-to-node mappings, $P$ 33 for activation of pattern nodes, and $P$ 34 for edge activation, together with polynomial-size expansions for sequence and subgraph wildcards. A preprocessing rule merges nodes in the target graph when the pattern has no edge whose endpoints are both wildcards and the node fails all pattern node constraints while having exactly one predecessor and one successor. On 946,556 instances from CodeSearchNet Python control-flow graphs, the paper reports that node-merging reduced timeouts from 34,110 instances (3.6%) to 26,487 (2.8%), reduced $P$ 35 by 15.4% on average, and reduced CNF clause count by 25.7%; for medium graphs of at most 50 nodes, almost all instances were solved in less than 1 second (Terra-Neves et al., 2023).

MultiGraphMatch addresses subgraph matching in multigraphs with node labels, edge types, and multiple properties. Its central index is a bit signature matrix $P$ 36 whose rows encode, for each unordered target node pair, the labels of the two endpoints and the presence of incoming and outgoing edge types between them. For each query edge, a compatibility domain $P$ 37 is obtained by a bit-vector test $P$ 38 together with degree-multiplicity constraints. Query edges are then ordered by a lexicographic priority $P$ 39, where $P$ 40 counts already-covered endpoints and $P$ 41 balances domain size against local density using total degrees and a Jaccard term $P$ 42. The paper compares MultiGraphMatch with SuMGra, Neo4j, and Memgraph on synthetic and real graphs and reports comparable or better performance in all queries (Micale et al., 16 Jan 2025).

In computational biology, BMG editing studies a specialized graph-matching-by-correction problem. Informative triples $P$ 43 and forbidden triples $P$ 44 characterize when a properly colored digraph is a BMG, and BUILD, due to Aho et al., constructs a rooted tree from a consistent triple set. Since exact arc insertion, deletion, and symmetric-difference editing to BMGs are NP-complete, heuristic top-down partitioning methods optimize the UR-cost $P$ 45, recurse on partition blocks, and can be made consistent in the sense of leaving true BMGs unchanged. The reported benchmarks show that Louvain-based heuristics perform best in practice (Schaller et al., 2021).

6. Empirical results, recurring trade-offs, and limitations

Several papers converge on the same empirical theme: memory-efficient formulations and good initialization materially change the practical scale of graph matching. ATGM replaces an $P$ 46 affinity matrix by $P$ 47 storage plus $P$ 48 Laplacians, handles complete graphs at $P$ 49 with $P$ 50 storage while competing methods must drop to Delaunay graphs, and scales to $P$ 51 within minutes. On the CMU House and PASCAL Cars/Motorbikes benchmarks, it is reported to consistently outperform the compared baselines in equal- and unequal-size settings; the convex refinement $P$ 52 raises matching rate by 30–40% in unequal cases, and the outlier-removal pre-processing boosts all other methods by at least 10% (Wang et al., 2018).

FRGM reports a closely related efficiency–accuracy pattern. By using edge-attribute matrices rather than the affinity matrix, it reduces space complexity by two orders of magnitude relative to affinity-matrix methods and reports state-of-the-art performance on synthetic graphs, 3D face matching, real image benchmarks such as CMU House/Hotel and Pascal Cars/Motorbikes, and deformable registration tasks. The paper states that FRGM-E handles 1000 nodes in less than 200 seconds and that AFW accelerates deformable matching by a factor of 8–10 relative to vanilla FW (Wang et al., 2019).

HGMN’s results emphasize the value of explicit high-order structure. On social-network alignment, the paper reports that 0-HGMN already outperforms prior GCN-based methods, and that higher-order variants achieve the best $P$ 53; on Twitter–Foursquare, it reports $P$ 54 for 0-1-2-HGMN versus $P$ 55 for DGMC and $P$ 56 for plain GCN. On DBP15K cross-lingual knowledge-graph alignment, 1-HGMN reports up to a $P$ 57 absolute gain in $P$ 58 over DGMC, although 2-HGMN can be slightly worse than 1-HGMN, suggesting diminishing returns beyond first order on those graphs (Xu et al., 2020).

GASM’s evaluation highlights a different trade-off: explicit integration of attributes without abandoning competitive structural performance. On synthetic isomorphic benchmarks, it is reported to achieve accuracy comparable to Zager and higher structural quality $P$ 59, and on 128 QAPLIB instances it is reported to be consistently better than FAQ and 2opt, with GASM slightly outperforming Zager on many instances. In timing experiments on Erdős–Rényi graphs up to roughly 2000 vertices, GASM-GPU is reported as fastest for $P$ 60, while GASM-CPU is approximately comparable to Zager (Candelier, 2024).

The principal limitations are equally consistent across the literature. QAP-based graph matching is NP-hard, and learning-based compatibility estimation still requires repeated solution of approximate QAPs and access to ground-truth correspondences during training (0806.2890). ATGM’s first-stage objective $P$ 61 is nonconvex and therefore susceptible to local minima; the paper also states that it assumes graphs lie in a common Euclidean space and requires tuning of $P$ 62, $P$ 63, and $P$ 64 (Wang et al., 2018). In iGraphMatch, the convex relaxation is globally optimal over the Birkhoff polytope but its projection need not lie near a permutation, whereas the indefinite relaxation often has local optima on the boundary and depends on initialization (Qiao et al., 2021). ReGaP matching remains sensitive to wildcard structure: the paper identifies a pattern containing an edge between two $P$ 65 wildcards as the only case that consistently misbehaves because it triggers quadratic expansion and many timeouts (Terra-Neves et al., 2023). Exact BMG editing, completion, and deletion are NP-complete, so practical workflows rely on heuristics rather than guaranteed global optima (Schaller et al., 2021).

Taken together, these results suggest that “match graph” is not a single algorithmic problem but a spectrum of formally related tasks: one-to-one alignment, subgraph isomorphism, wildcard pattern matching, multigraph querying, and editing into specialized graph classes. A plausible implication is that representation choice now functions as the primary design decision. Affinity-matrix QAPs, functional maps, transformation-based Euclidean models, SAT encodings, graphlet decompositions, iterated line graphs, and triple-consistency methods each preserve a different notion of structure, and the empirical behavior of a matcher is largely determined by that choice.