McSplit MCIS Algorithm
- The McSplit MCIS algorithm is a branch-and-bound method that incrementally builds a maximum common induced subgraph using bidomain partitioning.
- It employs neighborhood-consistent reductions, refined upper bounds, and equivalence class pruning to efficiently cut infeasible search branches.
- Advanced variants integrate reinforcement learning and heuristic-based branching to achieve significant speedups and reduce redundant computations on benchmark datasets.
The McSplit Maximum Common Induced Subgraph (MCIS) algorithm is a practical branch-and-bound backtracking approach for determining the largest induced subgraph common to two undirected, unlabeled graphs. The McSplit family of algorithms, widely referenced in the literature, incorporates a bidomain partitioning framework and sophisticated pruning, selection, and learning heuristics, underpinning much of the recent progress on large-scale MCIS computation (Yu et al., 17 Feb 2025, Zhou et al., 2022, Liu et al., 2022).
1. Problem Definition and Theoretical Foundations
Given two undirected simple graphs, and , the MCIS problem seeks the largest set such that the induced subgraph on the chosen vertices of is isomorphic to the induced subgraph on the corresponding vertices of . McSplit algorithms model this as an incremental construction of a partial isomorphism , where is the current partial mapping and is the set of eligible unmatched pairs.
The search is guided by:
- Recursive branching: At each step, extending or refusing pairings to incrementally enlarge .
- Neighborhood-consistent reduction: Pruning candidate pairs in that could not participate in any feasible extension of , based on adjacency patterns.
- Upper-bound estimation: Pruning search branches using tight cardinality estimates for the maximum possible extension given the current state.
2. Bidomain Partitioning and Search Framework
A distinguishing feature of McSplit is bidomain partitioning. After each reduction, is partitioned into disjoint "boxes" or bidomains, , where each and contain vertices indistinguishable under current compatibility constraints. This partitioning allows efficient, symmetry-aware search.
Search procedure:
- Select a box and a vertex .
- For every , attempt to extend with and update accordingly.
- After all such extensions, "skip " by removing all pairs involving from and recursively proceed.
The standard upper bound at each node is
which reflects the maximum number of further matches possible from each box (Yu et al., 17 Feb 2025).
3. Enhancements: Redundancy Reduction and Tightening via RRSplit
Empirical investigation revealed the McSplit framework explores superfluous, isomorphic subproblems and non-extendable partial solutions. RRSplit introduces three orthogonal refinements to address these inefficiencies (Yu et al., 17 Feb 2025):
(A) Vertex-equivalence reductions
Vertices of are partitioned into equivalence classes according to their structural (adjacency) patterns:
If is already known to be non-extendable for some , the branch for is pruned.
Additionally, once is excluded, all other can be excluded simultaneously: only one representative per equivalence class is required in the search.
(B) Maximality-based reduction
If for some , and have identical adjacency patterns into every other box (either complete or empty), then some maximum solution must include . Only the child that includes is explored; all others are pruned, and the exclusion set is updated to propagate equivalent prunings.
(C) Refined upper bounds
Accounting for excluded pairs and vertex-equivalence, the refined upper bound is
where leverages knowledge of which have already been excluded and which are redundant due to equivalence classes. This tightens pruning aggressiveness and substantially reduces superfluous search (Yu et al., 17 Feb 2025).
4. Algorithmic Complexity and Theoretical Guarantees
Prior to RRSplit, McSplit lacked nontrivial worst-case guarantees despite practical effectiveness. RRSplit achieves a worst-case running time of , matching existing best-known results for the MCIS problem. This is achieved by limiting the number of distinct partial isomorphism states to for partial mappings of size , and bounding the number of exclude- child branches per partial solution size (Yu et al., 17 Feb 2025).
5. Advances over McSplit: Learning-Augmented and Hybrid Algorithms
Several McSplit variants integrate learning-based branching heuristics:
- McSplit+RL uses RL-based accumulative reward branching.
- McSplit+LL combines Long-Short Memory (LSM) and Leaf Vertex Union Match (LUM), tracking both short-term (per-vertex) and long-term (pairwise) branching rewards to guide selection, and bulk-matches leaf neighbors whenever possible, substantially reducing search depth (Zhou et al., 2022).
- McSplitDAL augments McSplit+LL by introducing a DAL value function, which incorporates not only the decrease in the upper bound but also the fragmentation of the bidomain partition post-branch, and alternates this heuristic with RL strategies to avoid myopic search (Liu et al., 2022).
RRSplit is orthogonal in that it targets redundant computation (isomorphic/additive branches), while these McSplit derivatives focus on expedient traversal of the search space via branching policy improvements.
6. Empirical Evaluation and Benchmark Performance
Table 1 below summarizes experimental comparisons between RRSplit and McSplitDAL on four datasets:
| Dataset | Instances | RRSplit Solved | McSplitDAL Solved | ≥5x Speedup (RRSplit) |
|---|---|---|---|---|
| BI | 9,180 | 7,730 | 4,696 | 91.3% |
| CV | 6,424 | 1,351 | 1,291 | 76.5% |
| PR | 24 | 24 | 24 | 91.7% |
| LV | 6,216 | 1,059 | 883 | 68.0% |
RRSplit solves substantially more instances and achieves large (often orders-of-magnitude) speedups relative to the best McSplit variant. Ablation studies identify that all three RRSplit enhancements are crucial for peak performance; omitting any of them degrades results (Yu et al., 17 Feb 2025).
7. Implementation Considerations and Practical Guidance
All McSplit-like algorithms leverage data structures such as bitsets or compact vectors for candidate sets and bidomain labels. RRSplit and advanced McSplit variants require, in addition, tables for exclusion sets and equivalence class lookups. Implementation incurs minimal additional memory or computational overhead per branch, with all enhancements paying for themselves by enabling exponential reductions in the explored search tree. Decay mechanisms for learning-based heuristics are realized through efficient arithmetic operations, e.g., integer right-shift (Zhou et al., 2022).
Memory and computational costs are dominated by the combinatorial explosion inherent in MCIS but are significantly mitigated by the structure and pruning strategies of McSplit, RRSplit, and their derivatives.