Papers
Topics
Authors
Recent
Search
2000 character limit reached

McSplit MCIS Algorithm

Updated 14 January 2026
  • The McSplit MCIS algorithm is a branch-and-bound method that incrementally builds a maximum common induced subgraph using bidomain partitioning.
  • It employs neighborhood-consistent reductions, refined upper bounds, and equivalence class pruning to efficiently cut infeasible search branches.
  • Advanced variants integrate reinforcement learning and heuristic-based branching to achieve significant speedups and reduce redundant computations on benchmark datasets.

The McSplit Maximum Common Induced Subgraph (MCIS) algorithm is a practical branch-and-bound backtracking approach for determining the largest induced subgraph common to two undirected, unlabeled graphs. The McSplit family of algorithms, widely referenced in the literature, incorporates a bidomain partitioning framework and sophisticated pruning, selection, and learning heuristics, underpinning much of the recent progress on large-scale MCIS computation (Yu et al., 17 Feb 2025, Zhou et al., 2022, Liu et al., 2022).

1. Problem Definition and Theoretical Foundations

Given two undirected simple graphs, Q=(VQ,EQ)Q=(V_Q, E_Q) and G=(VG,EG)G=(V_G, E_G), the MCIS problem seeks the largest set SVQ×VGS^* \subseteq V_Q \times V_G such that the induced subgraph on the chosen vertices of QQ is isomorphic to the induced subgraph on the corresponding vertices of GG. McSplit algorithms model this as an incremental construction of a partial isomorphism (S,C)(S, C), where SS is the current partial mapping and CC is the set of eligible unmatched pairs.

The search is guided by:

  • Recursive branching: At each step, extending or refusing pairings to incrementally enlarge SS.
  • Neighborhood-consistent reduction: Pruning candidate pairs in CC that could not participate in any feasible extension of SS, based on adjacency patterns.
  • Upper-bound estimation: Pruning search branches using tight cardinality estimates for the maximum possible extension given the current state.

2. Bidomain Partitioning and Search Framework

A distinguishing feature of McSplit is bidomain partitioning. After each reduction, CC is partitioned into disjoint "boxes" or bidomains, C=(X1×Y1)(Xk×Yk)C = (X_1 \times Y_1) \cup \cdots \cup (X_k \times Y_k), where each XiX_i and YiY_i contain vertices indistinguishable under current compatibility constraints. This partitioning allows efficient, symmetry-aware search.

Search procedure:

  • Select a box X×YX \times Y and a vertex uXu \in X.
  • For every vYv \in Y, attempt to extend SS with (u,v)(u, v) and update CC accordingly.
  • After all such extensions, "skip uu" by removing all pairs involving uu from CC and recursively proceed.

The standard upper bound at each node is

UBS,C=S+(Xi×Yi)Cmin{Xi,Yi}UB_{S,C} = |S| + \sum_{(X_i \times Y_i) \subseteq C} \min\{|X_i|, |Y_i|\}

which reflects the maximum number of further matches possible from each box (Yu et al., 17 Feb 2025).

3. Enhancements: Redundancy Reduction and Tightening via RRSplit

Empirical investigation revealed the McSplit framework explores superfluous, isomorphic subproblems and non-extendable partial solutions. RRSplit introduces three orthogonal refinements to address these inefficiencies (Yu et al., 17 Feb 2025):

(A) Vertex-equivalence reductions

Vertices of QQ are partitioned into equivalence classes Ψ(u)\Psi(u) according to their structural (adjacency) patterns:

uu    wVQ:(u,w)EQ(u,w)EQu \sim u' \iff \forall w \in V_Q: (u, w) \in E_Q \Leftrightarrow (u', w) \in E_Q

If (u,v)(u', v) is already known to be non-extendable for some uΨ(u)u' \in \Psi(u), the branch for (u,v)(u, v) is pruned.

Additionally, once uu is excluded, all other uΨ(u)u' \in \Psi(u) can be excluded simultaneously: only one representative per equivalence class is required in the search.

(B) Maximality-based reduction

If for some (u,v)X×Y(u, v) \in X \times Y, uu and vv have identical adjacency patterns into every other box (either complete or empty), then some maximum solution must include (u,v)(u, v). Only the child that includes (u,v)(u, v) is explored; all others are pruned, and the exclusion set DD is updated to propagate equivalent prunings.

(C) Refined upper bounds

Accounting for excluded pairs and vertex-equivalence, the refined upper bound is

UBS,C,D=S+(X×Y)P(C)ubX,Y,DUB_{S,C,D} = |S| + \sum_{(X \times Y) \in \mathcal{P}(C)} ub_{X,Y,D}

where ubX,Y,Dub_{X,Y,D} leverages knowledge of which (u,v)(u, v) have already been excluded and which are redundant due to equivalence classes. This tightens pruning aggressiveness and substantially reduces superfluous search (Yu et al., 17 Feb 2025).

4. Algorithmic Complexity and Theoretical Guarantees

Prior to RRSplit, McSplit lacked nontrivial worst-case guarantees despite practical effectiveness. RRSplit achieves a worst-case running time of O((VG+1)VQ)O^*((|V_G|+1)^{|V_Q|}), matching existing best-known results for the MCIS problem. This is achieved by limiting the number of distinct partial isomorphism states to k!(VQk)(VGk)k! \binom{|V_Q|}{k} \binom{|V_G|}{k} for partial mappings of size kk, and bounding the number of exclude-uu child branches per partial solution size (Yu et al., 17 Feb 2025).

5. Advances over McSplit: Learning-Augmented and Hybrid Algorithms

Several McSplit variants integrate learning-based branching heuristics:

  • McSplit+RL uses RL-based accumulative reward branching.
  • McSplit+LL combines Long-Short Memory (LSM) and Leaf Vertex Union Match (LUM), tracking both short-term (per-vertex) and long-term (pairwise) branching rewards to guide selection, and bulk-matches leaf neighbors whenever possible, substantially reducing search depth (Zhou et al., 2022).
  • McSplitDAL augments McSplit+LL by introducing a DAL value function, which incorporates not only the decrease in the upper bound but also the fragmentation of the bidomain partition post-branch, and alternates this heuristic with RL strategies to avoid myopic search (Liu et al., 2022).

RRSplit is orthogonal in that it targets redundant computation (isomorphic/additive branches), while these McSplit derivatives focus on expedient traversal of the search space via branching policy improvements.

6. Empirical Evaluation and Benchmark Performance

Table 1 below summarizes experimental comparisons between RRSplit and McSplitDAL on four datasets:

Dataset Instances RRSplit Solved McSplitDAL Solved ≥5x Speedup (RRSplit)
BI 9,180 7,730 4,696 91.3%
CV 6,424 1,351 1,291 76.5%
PR 24 24 24 91.7%
LV 6,216 1,059 883 68.0%

RRSplit solves substantially more instances and achieves large (often orders-of-magnitude) speedups relative to the best McSplit variant. Ablation studies identify that all three RRSplit enhancements are crucial for peak performance; omitting any of them degrades results (Yu et al., 17 Feb 2025).

7. Implementation Considerations and Practical Guidance

All McSplit-like algorithms leverage data structures such as bitsets or compact vectors for candidate sets and bidomain labels. RRSplit and advanced McSplit variants require, in addition, tables for exclusion sets and equivalence class lookups. Implementation incurs minimal additional memory or computational overhead per branch, with all enhancements paying for themselves by enabling exponential reductions in the explored search tree. Decay mechanisms for learning-based heuristics are realized through efficient arithmetic operations, e.g., integer right-shift (Zhou et al., 2022).

Memory and computational costs are dominated by the combinatorial explosion inherent in MCIS but are significantly mitigated by the structure and pruning strategies of McSplit, RRSplit, and their derivatives.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to McSplit Maximum Common Induced Subgraph Algorithm.