McSplit MCIS Algorithm

Updated 14 January 2026

The McSplit MCIS algorithm is a branch-and-bound method that incrementally builds a maximum common induced subgraph using bidomain partitioning.
It employs neighborhood-consistent reductions, refined upper bounds, and equivalence class pruning to efficiently cut infeasible search branches.
Advanced variants integrate reinforcement learning and heuristic-based branching to achieve significant speedups and reduce redundant computations on benchmark datasets.

The McSplit Maximum Common Induced Subgraph (MCIS) algorithm is a practical branch-and-bound backtracking approach for determining the largest induced subgraph common to two undirected, unlabeled graphs. The McSplit family of algorithms, widely referenced in the literature, incorporates a bidomain partitioning framework and sophisticated pruning, selection, and learning heuristics, underpinning much of the recent progress on large-scale MCIS computation (Yu et al., 17 Feb 2025, Zhou et al., 2022, Liu et al., 2022).

1. Problem Definition and Theoretical Foundations

Given two undirected simple graphs, $Q=(V_Q, E_Q)$ and $G=(V_G, E_G)$ , the MCIS problem seeks the largest set $S^* \subseteq V_Q \times V_G$ such that the induced subgraph on the chosen vertices of $Q$ is isomorphic to the induced subgraph on the corresponding vertices of $G$ . McSplit algorithms model this as an incremental construction of a partial isomorphism $(S, C)$ , where $S$ is the current partial mapping and $C$ is the set of eligible unmatched pairs.

The search is guided by:

Recursive branching: At each step, extending or refusing pairings to incrementally enlarge $S$ .
Neighborhood-consistent reduction: Pruning candidate pairs in $C$ that could not participate in any feasible extension of $S$ , based on adjacency patterns.
Upper-bound estimation: Pruning search branches using tight cardinality estimates for the maximum possible extension given the current state.

2. Bidomain Partitioning and Search Framework

A distinguishing feature of McSplit is bidomain partitioning. After each reduction, $C$ is partitioned into disjoint "boxes" or bidomains, $C = (X_1 \times Y_1) \cup \cdots \cup (X_k \times Y_k)$ , where each $X_i$ and $Y_i$ contain vertices indistinguishable under current compatibility constraints. This partitioning allows efficient, symmetry-aware search.

Search procedure:

Select a box $X \times Y$ and a vertex $u \in X$ .
For every $v \in Y$ , attempt to extend $S$ with $(u, v)$ and update $C$ accordingly.
After all such extensions, "skip $u$ " by removing all pairs involving $u$ from $C$ and recursively proceed.

The standard upper bound at each node is

$UB_{S,C} = |S| + \sum_{(X_i \times Y_i) \subseteq C} \min\{|X_i|, |Y_i|\}$

which reflects the maximum number of further matches possible from each box (Yu et al., 17 Feb 2025).

3. Enhancements: Redundancy Reduction and Tightening via RRSplit

Empirical investigation revealed the McSplit framework explores superfluous, isomorphic subproblems and non-extendable partial solutions. RRSplit introduces three orthogonal refinements to address these inefficiencies (Yu et al., 17 Feb 2025):

(A) Vertex-equivalence reductions

Vertices of $Q$ are partitioned into equivalence classes $\Psi(u)$ according to their structural (adjacency) patterns:

$u \sim u' \iff \forall w \in V_Q: (u, w) \in E_Q \Leftrightarrow (u', w) \in E_Q$

If $(u', v)$ is already known to be non-extendable for some $u' \in \Psi(u)$ , the branch for $(u, v)$ is pruned.

Additionally, once $u$ is excluded, all other $u' \in \Psi(u)$ can be excluded simultaneously: only one representative per equivalence class is required in the search.

(B) Maximality-based reduction

If for some $(u, v) \in X \times Y$ , $u$ and $v$ have identical adjacency patterns into every other box (either complete or empty), then some maximum solution must include $(u, v)$ . Only the child that includes $(u, v)$ is explored; all others are pruned, and the exclusion set $D$ is updated to propagate equivalent prunings.

(C) Refined upper bounds

Accounting for excluded pairs and vertex-equivalence, the refined upper bound is

$UB_{S,C,D} = |S| + \sum_{(X \times Y) \in \mathcal{P}(C)} ub_{X,Y,D}$

where $ub_{X,Y,D}$ leverages knowledge of which $(u, v)$ have already been excluded and which are redundant due to equivalence classes. This tightens pruning aggressiveness and substantially reduces superfluous search (Yu et al., 17 Feb 2025).

4. Algorithmic Complexity and Theoretical Guarantees

Prior to RRSplit, McSplit lacked nontrivial worst-case guarantees despite practical effectiveness. RRSplit achieves a worst-case running time of $O^*((|V_G|+1)^{|V_Q|})$ , matching existing best-known results for the MCIS problem. This is achieved by limiting the number of distinct partial isomorphism states to $k! \binom{|V_Q|}{k} \binom{|V_G|}{k}$ for partial mappings of size $k$ , and bounding the number of exclude- $u$ child branches per partial solution size (Yu et al., 17 Feb 2025).

5. Advances over McSplit: Learning-Augmented and Hybrid Algorithms

Several McSplit variants integrate learning-based branching heuristics:

McSplit+RL uses RL-based accumulative reward branching.
McSplit+LL combines Long-Short Memory (LSM) and Leaf Vertex Union Match (LUM), tracking both short-term (per-vertex) and long-term (pairwise) branching rewards to guide selection, and bulk-matches leaf neighbors whenever possible, substantially reducing search depth (Zhou et al., 2022).
McSplitDAL augments McSplit+LL by introducing a DAL value function, which incorporates not only the decrease in the upper bound but also the fragmentation of the bidomain partition post-branch, and alternates this heuristic with RL strategies to avoid myopic search (Liu et al., 2022).

RRSplit is orthogonal in that it targets redundant computation (isomorphic/additive branches), while these McSplit derivatives focus on expedient traversal of the search space via branching policy improvements.

6. Empirical Evaluation and Benchmark Performance

Table 1 below summarizes experimental comparisons between RRSplit and McSplitDAL on four datasets:

Dataset	Instances	RRSplit Solved	McSplitDAL Solved	≥5x Speedup (RRSplit)
BI	9,180	7,730	4,696	91.3%
CV	6,424	1,351	1,291	76.5%
PR	24	24	24	91.7%
LV	6,216	1,059	883	68.0%

RRSplit solves substantially more instances and achieves large (often orders-of-magnitude) speedups relative to the best McSplit variant. Ablation studies identify that all three RRSplit enhancements are crucial for peak performance; omitting any of them degrades results (Yu et al., 17 Feb 2025).

7. Implementation Considerations and Practical Guidance

All McSplit-like algorithms leverage data structures such as bitsets or compact vectors for candidate sets and bidomain labels. RRSplit and advanced McSplit variants require, in addition, tables for exclusion sets and equivalence class lookups. Implementation incurs minimal additional memory or computational overhead per branch, with all enhancements paying for themselves by enabling exponential reductions in the explored search tree. Decay mechanisms for learning-based heuristics are realized through efficient arithmetic operations, e.g., integer right-shift (Zhou et al., 2022).

Memory and computational costs are dominated by the combinatorial explosion inherent in MCIS but are significantly mitigated by the structure and pruning strategies of McSplit, RRSplit, and their derivatives.

Markdown Report Issue Upgrade to Chat

References (3)

Fast Maximum Common Subgraph Search: A Redundancy-Reduced Backtracking Approach (2025)

A Strengthened Branch and Bound Algorithm for the Maximum Common (Connected) Subgraph Problem (2022)

Hybrid Learning with New Value Function for the Maximum Common Subgraph Problem (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to McSplit Maximum Common Induced Subgraph Algorithm.