Assignment Kernels for Structured Data
- Assignment kernels are similarity measures that optimally align substructures in composite objects through bijections or optimal transport, capturing both quality and arrangement.
- They leverage hierarchical representations and histogram intersections to ensure positive-definite forms and near-linear computation in practical scenarios.
- These kernels outperform standard convolution methods by preserving global correspondences and supporting extensions such as MKL and deep graph integrations.
Assignment kernels are a prominent family of structured data kernels designed to measure similarity between composite objects—such as graphs, sets, or sequences—by optimally aligning their constituent parts. In contrast to -convolution kernels, which aggregate similarities over all pairs of parts, assignment kernels maximize the overall correspondence by finding an optimal bijection or, more generally, an optimal transport plan between the parts. This one-to-one or mass-splitting alignment provides a notion of similarity that is sensitive both to the quality and arrangement of substructures. Assignment kernels include classical bijective formulations based on combinatorial optimization, as well as modern extensions via optimal transport (OT) that interpolate between rigid matching and soft mass redistribution; both admit efficient positive-definite forms under appropriate conditions.
1. Mathematical Formulation and Theoretical Foundations
Let denote a domain of atomic parts (e.g., vertices, subtrees). Given two multisets with , assignment kernels are defined by choosing a bijection between and that maximizes the total similarity according to a base kernel :
where denotes the set of all bijections (Nikolentzos et al., 2019, Kriege et al., 2016).
A key property is positive-definiteness (PD), which is not guaranteed for arbitrary base kernels . A necessary and sufficient condition is that be a strong kernel, characterized by the condition
Strong kernels can be equivalently realized via hierarchies: there exists a rooted tree on (leaves as elements, non-decreasing weights toward the root), with (Kriege et al., 2016). With a strong kernel, the assignment kernel is guaranteed PD and admits linear-time computation by histogram intersection.
For unequal set sizes, dummy elements with zero similarity can be padded to smaller sets (Nikolentzos et al., 2019).
2. Efficient Computation via Hierarchies and Histogram Intersection
Given the equivalence between strong base kernels and tree-induced hierarchies, the assignment kernel’s optimal bijection can be computed by a weighted histogram intersection over the hierarchy. For and every node in ,
where is the set of parts in mapped under , and is the additive weight at node (Kriege et al., 2016, Kriege, 2019).
This formulation enables kernel computation per pair, effectively linear in practice given the sparsity of . The same machinery underlies scalable assignment kernels such as the Weisfeiler–Lehman Optimal Assignment (WL-OA) and Pyramid Match Graph kernels.
3. Principal Assignment Kernels: WL-OA, Deep Assignment, Pyramid Match, and Wasserstein Variants
3.1 Weisfeiler–Lehman Optimal Assignment Kernel
The Weisfeiler–Lehman (WL) assignment kernel uses a base kernel reflecting the number of shared colors assigned to vertices during up to iterations of WL color refinement:
A hierarchy is built on color classes across rounds, and for each class (Kriege et al., 2016, Nikolentzos et al., 2019). This strong kernel admits linear-time evaluation via histogram intersection, and outperforms convolutional WL kernels in discriminating fine-grained structural variations (see Table 1 below).
3.2 Deep Assignment Kernels via Multiple Kernel Learning
Weights in the WL-induced hierarchy can be learned discriminatively using multiple kernel learning (MKL), yielding the Deep Weisfeiler–Lehman Assignment (DWL-OA) kernel. The kernel decomposes as
where are optimized subject to . Sparse solutions are routinely obtained, and nontrivial accuracy improvements over fixed-weight WL-OA were reported (Kriege, 2019).
3.3 Pyramid Match Graph Kernel
The Pyramid Match Graph kernel aligns point clouds (e.g., spectral vertex embeddings) using a hierarchy of spatial grids. For each quantization level , histograms are constructed, and incremental matches are counted via intersection. The final kernel aggregates matches at multiple resolutions:
with denoting histogram intersection and the level weight. This approach supports many-to-many alignments and is PSD by construction (Nikolentzos et al., 2019).
3.4 Assignment via Optimal Transport: Wasserstein WL Kernel
The Wasserstein Weisfeiler–Lehman (WWL) kernel generalizes assignment by formulating graph comparison as an optimal transport (OT) problem on the empirical node-feature distributions. For node embeddings with weights :
subject to prescribed marginals. The kernel form is (Togninalli et al., 2019). In the categorical case with Hamming cost, WWL is PD due to conditional negative-definiteness of ; the continuous case is not guaranteed PD, though empirical Gram matrices are often nearly so.
4. Expressivity, Consistency, and Structural Advantages
Assignment kernels provide greater expressivity than convolution kernels, as they align substructures one-to-one rather than summing over all pairs, capturing global correspondences and finer combinatorial variations (Nikolentzos et al., 2019, Kriege et al., 2016). For instance, graphs with identical substructure frequencies but differing arrangements are readily distinguished.
The WLOA kernel displays monotonicity and asymptotic order consistency: as the height of the WL refinement increases, the induced similarity ranking between pairs of graphs stabilizes, a property not shared by the WL-subtree kernel (Liu et al., 2024). This stability further motivates analogs for deep learning architectures, such as a layer-wise consistency loss for GNNs inspired by the WLOA’s ordering behavior.
Optimal transport–based assignment (e.g., WWL) allows for partial (mass-splitting) and continuous attribute alignment, integrating weighted and attributed graphs within a unified framework (Togninalli et al., 2019).
5. Computational Aspects and Scalability
The naive assignment kernel involves solving a bipartite matching (Hungarian algorithm) with complexity for graphs of nodes. However, for strong base kernels, histogram-intersection based formulations reduce the complexity to effectively or for the WL-OA kernel and its deep and pyramid variants (Kriege et al., 2016, Nikolentzos et al., 2019, Kriege, 2019). The Pyramid Match kernel admits time for -dimensional embeddings at levels.
For OT-based kernels (WWL), exact network-simplex solvers run in but entropic regularization via Sinkhorn iteration yields near-linear or evaluation (Togninalli et al., 2019). Approximations such as node subsampling or feature quantization further accelerate large-scale computation.
6. Empirical Performance and Comparative Evaluation
Assignment kernels have been empirically validated on several graph classification benchmarks spanning chemical, biological, and social network domains. The WL-OA kernel consistently matches or outperforms the standard WL subtree (convolution) kernel, with pronounced gains on datasets emphasizing structural neighborhood alignment (Kriege et al., 2016, Nikolentzos et al., 2019). Deep WL-OA kernels with MKL-learned weights sometimes further enhance accuracy (Kriege, 2019).
The Pyramid Match kernel ranks among the top assignment-based methods for both labeled and unlabeled datasets (Nikolentzos et al., 2019). OT-based WWL kernels surpass prior state-of-the-art on several attributed graph problems, particularly when capturing distributional differences is key (Togninalli et al., 2019).
Extensive benchmarking establishes assignment kernels as a high-performing, scalable alternative to standard convolutional graph kernels. Notably, on node-attributed or real-valued feature graphs, GNN models (e.g., GIN, DiffPool) may outperform assignment kernels, but the latter remain strong baselines.
7. Extensions, Limitations, and Directions for Research
Assignment kernels admit various extensions:
- Learnable weighting via MKL (hierarchy-structured DWL-OA kernels) enables model sparsity and task-focused discriminative power (Kriege, 2019).
- WWL allows hybridization with GNN representations for node (sub-)embeddings (Togninalli et al., 2019).
- Incorporation of cross-layer similarity consistency into GNN training objectives produces measurable improvements in classification performance and stability (Liu et al., 2024).
Noted limitations include:
- Indefiniteness for non-strong or continuous base kernels, mandating either specialized SVMs (Kreĭn space solvers) or empirical PD corrections.
- High computational cost for general OT assignment (especially without histogram shortcut).
- On heavily attributed graphs, assignment kernels may lag behind learned neural architectures.
Active directions include more efficient OT solvers (e.g. Sinkhorn, sliced Wasserstein), development of explicit feature maps for OT-based distances, and extensions to accommodate edge and mixed-type attributes within the assignment framework (Togninalli et al., 2019). Efforts continue to bridge assignment kernel theory with neural message-passing, leveraging their structural stability properties.
Table: Computational and Theoretical Properties of Key Assignment Kernels
| Kernel | Positive Definite (PD) Guarantee | Complexity (per pair) |
|---|---|---|
| WL-OA | Yes, when base kernel is strong | |
| Deep WL-OA (MKL) | Yes, if weights non-negative (hierarchy-induced only) | |
| Pyramid Match | Yes, via weighted histogram intersection | |
| WWL (OT-based) | Discrete: Yes; Continuous: Generally indefinite | (OT), (Sinkhorn) |
References: (Kriege et al., 2016, Nikolentzos et al., 2019, Togninalli et al., 2019, Kriege, 2019, Liu et al., 2024)