Assignment and Matching Kernels
- Assignment and matching kernels are similarity measures that use an optimal bijection between parts of structured data, offering a more discriminative alignment than convolution approaches.
- They leverage strong base kernels and hierarchical representations to ensure positive definiteness and enable efficient computation via histogram intersections and multiple kernel learning.
- Extensions of these kernels to graphs and shapes facilitate robust graph classification, shape alignment, and scalable solutions to quadratic assignment problems.
Assignment and matching kernels are a central class of similarity measures for structured data, including graphs, sets, shapes, and combinatorial structures. Contrasting with convolution kernels, which aggregate all pairwise partwise similarities, assignment kernels focus on an optimal bijection—typically the permutation or mapping of elements maximizing total similarity according to a prescribed base kernel. This approach yields more discriminative and structurally meaningful measures for tasks such as graph classification and shape correspondence, and generalizes to quadratic assignment formulations in graph matching. Assignment kernels, however, present challenges in ensuring positive definiteness and computational tractability, motivating strong theoretical analysis and algorithmic innovation across recent research.
1. Mathematical Foundations of Assignment and Matching Kernels
Given multisets and drawn from some set , and a base kernel , the optimal-assignment kernel is
where is the set of all permutations. If , the smaller set is padded with dummy elements (zero similarity to all). This kernel captures the maximal alignment under and is a cornerstone for analyzing structured data (Kriege, 2019, Kriege et al., 2016, 0801.4061).
In contrast, matching kernels generalize this idea via a soft-max over all permutations; a typical example is the Gibbs matching kernel:
where tunes the concentration around the optimal permutation (Kriege et al., 2016). As , .
In the context of graphs, these formulations underpin kernel-based approaches for graph similarity, classification, and matching. The assignment kernel specializes to graph domains by either matching vertices or edges according to label or structural similarity (Kriege et al., 2016, Kriege, 2019, Salim et al., 2020).
2. Validity and Hierarchical Structure: Positive-Definiteness
A fundamental concern is whether the optimal-assignment kernel, as defined above, is positive-definite (PD)—a necessary property for standard kernel methods. It is established that, in general, is not always PD for arbitrary (0801.4061). Explicit counterexamples are constructed using as a Gaussian kernel, showing negative eigenvalues in the Gram matrix.
A precise criterion for validity is the concept of strong kernels. A kernel is strong if
This property is equivalent to arising from a rooted tree (hierarchy) over , with a non-negative weight function on the nodes such that
where is the path from leaf to the root.
For strong , the assignment kernel reduces to the histogram intersection:
where denotes the elements of in the subtree rooted at (Kriege, 2019, Kriege et al., 2016). The histogram-intersection kernel is known to be positive-definite, resolving validity whenever is strong.
A summary of kernel validity for various constructions:
| Base Kernel | Assignment Kernel PD? | Certificate/Structure |
|---|---|---|
| Arbitrary | No | Counterexamples for Gaussian |
| Strong (hierarchy-based) | Yes | Histogram intersection form |
| Dirac/kernel on labels | Yes | Special case (degenerate hierarchy) |
3. Algorithmic Implementations and Learning
Efficient computation of assignment kernels hinges on the availability of the hierarchical representation of . For a strong kernel, hierarchy construction is performed via an incremental procedure, with the assignment kernel itself computed in linear time by histogram intersection over the tree (Kriege et al., 2016).
Learning the hierarchy weights becomes crucial in adapting the assignment kernel to a given task. This is addressed by expressing as a non-negative combination of base (per-node) kernels and optimizing weights via multiple kernel learning (MKL):
The MKL problem is formulated as a saddle-point optimization, utilizing SVMs with kernel combinations and constraints , (Kriege, 2019). Practically, color-nodes can be clustered to reduce MKL dimensionality.
Quasi-optimal assignment kernels for graphs, such as the Weisfeiler-Lehman OA kernel, use the output of the 1-dim WL color refinement as the hierarchy, with histogram intersection over these WL color classes. Uniform weights recover the standard WL-OA kernel; learning produces the Deep WL-OA variant (Kriege, 2019, Kriege et al., 2016).
4. Specialization to Graphs and Attributed Structures
Assignment kernels have been extensively applied to graph similarity measures. The vertex-optimal assignment (V-OA) and edge-optimal assignment (E-OA) kernels use Dirac kernels over vertex or edge labels combined with assignment principles (Kriege et al., 2016). The Weisfeiler-Lehman optimal-assignment kernel (WL-OA) builds a hierarchy using iterative color refinement:
- At each WL iteration, new color classes are constructed by hashing the multiset of neighbor colors combined with the node's previous color.
- The tree of color classes across all iterations induces the hierarchy for the strong kernel.
In practice, these assignment kernels outperform or match the classification accuracy of previous substructure and convolution kernels, especially when combined with MKL-learned weights (Kriege, 2019).
Assignment kernels also underpin neighborhood-preserving kernels for attributed graphs, combining R-convolution kernels for continuous attributes with assignment kernels on label space. The resulting composite kernel remains PD and integrates structural and attribute similarity, leveraging product-graph constructions and recursive updating under WL refinement (Salim et al., 2020).
5. Quadratic Assignment and Kernelized Matching in Shape and Graph Alignment
Assignment and matching kernels generalize to quadratic assignment problems (QAP) central to graph and shape matching. The classical QAPs, Lawler's and Koopmans–Beckmann's forms, can be recast through kernel methods. In the KerGM framework, graph matching is expressed as optimization of
where arrays map edge attributes into a RKHS, and denotes -array composition (Zhang et al., 2019). This formalism unifies QAPs and exploits entropy-regularized Frank-Wolfe (EnFW) solvers with Sinkhorn-Knopp steps, yielding substantial improvements in scalability and matching accuracy for large graphs and 3D shapes.
In shape analysis, similar approaches combine pointwise and pairwise kernels (e.g., heat kernels on Laplace spectra), formulating kernelized QAPs for non-isometric shape correspondence. Convexity (from PD heat kernels) allows projected descent with guaranteed monotonic improvement, producing state-of-the-art shape alignment even under partiality and topological noise (Lähner et al., 2017).
6. Limitations, Extensions, and Empirical Performance
Assignment kernels, while effective and flexible, exhibit nuanced limitations:
- Positive-definiteness is not guaranteed without the "strong kernel" (hierarchy-induced) property. For generic , assignment kernels may be indefinite, complicating their use with classical kernel methods (0801.4061, Kriege et al., 2016). Empirical workarounds include regularization or projection onto PD cones.
- Soft-matching and exponential kernels (Gibbs) are typically indefinite or computationally intensive, as they sum over all permutations.
- Memory and storage costs can be substantial when constructing and maintaining hierarchical representations for highly non-discrete label or attribute spaces (Kriege et al., 2016, Salim et al., 2020).
Despite these challenges, assignment and matching kernels have set new benchmarks in graph classification, shape matching, and attributed structure analysis. On standard graph benchmarks (e.g., MUTAG, D&D, PROTEINS), Deep WL-OA and related assignment kernels match or slightly exceed prior state-of-the-art accuracies, with interpretability via sparse, learned weights (Kriege, 2019, Salim et al., 2020). Scalable kernelized QAP solutions (e.g., KerGM, heat kernel-based matching) demonstrate linear or near-linear scaling with graph/shape size and competitive or superior accuracy in vision and bioinformatics applications (Zhang et al., 2019, Lähner et al., 2017).
7. Broader Impact and Connections
Assignment and matching kernel frameworks synthesize ideas from combinatorial optimization, kernel methods, and structured data analysis. Their theoretical developments connect to the theory of positive-definite functions, tree-based hierarchies, and convex relaxations of permutation problems. The interplay between exactness, validity, and practical efficiency has driven advances in multiple areas:
- Powerful graph- and shape-matching pipelines, integrating pointwise, pairwise, and structural features.
- Hybrid kernel formulations for attributed data, merging R-convolution and assignment principles for maximally discriminative similarity (Salim et al., 2020).
- Methodological cross-fertilization between kernel methods, quadratic programming, and entropy-regularized transport.
Assignment and matching kernels remain an active domain, with ongoing efforts towards more expressive ground metrics, richer hierarchy learning, and robust indefinite-kernel handling in large-scale structured prediction and representation learning (Kriege, 2019, Zhang et al., 2019, Kriege et al., 2016, Salim et al., 2020).