Papers
Topics
Authors
Recent
Search
2000 character limit reached

Assignment Kernels for Structured Data

Updated 23 February 2026
  • Assignment kernels are similarity measures that optimally align substructures in composite objects through bijections or optimal transport, capturing both quality and arrangement.
  • They leverage hierarchical representations and histogram intersections to ensure positive-definite forms and near-linear computation in practical scenarios.
  • These kernels outperform standard convolution methods by preserving global correspondences and supporting extensions such as MKL and deep graph integrations.

Assignment kernels are a prominent family of structured data kernels designed to measure similarity between composite objects—such as graphs, sets, or sequences—by optimally aligning their constituent parts. In contrast to R\mathcal{R}-convolution kernels, which aggregate similarities over all pairs of parts, assignment kernels maximize the overall correspondence by finding an optimal bijection or, more generally, an optimal transport plan between the parts. This one-to-one or mass-splitting alignment provides a notion of similarity that is sensitive both to the quality and arrangement of substructures. Assignment kernels include classical bijective formulations based on combinatorial optimization, as well as modern extensions via optimal transport (OT) that interpolate between rigid matching and soft mass redistribution; both admit efficient positive-definite forms under appropriate conditions.

1. Mathematical Formulation and Theoretical Foundations

Let X\mathcal{X} denote a domain of atomic parts (e.g., vertices, subtrees). Given two multisets X,YXX, Y \subseteq \mathcal{X} with X=Y=n|X| = |Y| = n, assignment kernels are defined by choosing a bijection BB between XX and YY that maximizes the total similarity according to a base kernel k0:X×XRk_0: \mathcal{X} \times \mathcal{X} \to \mathbb{R}:

Kk0(X,Y)=maxBB(X,Y)(x,y)Bk0(x,y)K^{k_0}(X, Y) = \max_{B \in \mathfrak{B}(X, Y)} \sum_{(x, y) \in B} k_0(x, y)

where B(X,Y)\mathfrak{B}(X, Y) denotes the set of all bijections XYX \rightarrow Y (Nikolentzos et al., 2019, Kriege et al., 2016).

A key property is positive-definiteness (PD), which is not guaranteed for arbitrary base kernels k0k_0. A necessary and sufficient condition is that k0k_0 be a strong kernel, characterized by the condition

k0(x,y)min{k0(x,z),k0(z,y)}x,y,zXk_0(x, y) \geq \min\{k_0(x, z), k_0(z, y)\} \quad \forall x, y, z \in \mathcal{X}

Strong kernels can be equivalently realized via hierarchies: there exists a rooted tree TT on X\mathcal{X} (leaves as elements, non-decreasing weights ww toward the root), with k0(x,y)=w(LCA(x,y))k_0(x, y) = w(\mathrm{LCA}(x, y)) (Kriege et al., 2016). With a strong kernel, the assignment kernel is guaranteed PD and admits linear-time computation by histogram intersection.

For unequal set sizes, dummy elements with zero similarity can be padded to smaller sets (Nikolentzos et al., 2019).

2. Efficient Computation via Hierarchies and Histogram Intersection

Given the equivalence between strong base kernels and tree-induced hierarchies, the assignment kernel’s optimal bijection can be computed by a weighted histogram intersection over the hierarchy. For X,YXX, Y \subseteq \mathcal{X} and every node vv in TT,

Hk0(X)v=ω(v)Xv,Kk0(X,Y)=vV(T)min{Xv,Yv}ω(v)H^{k_0}(X)_v = \omega(v) |X_v|,\quad K^{k_0}(X, Y) = \sum_{v \in V(T)} \min\{|X_v|, |Y_v|\} \omega(v)

where XvX_v is the set of parts in XX mapped under vv, and ω(v)\omega(v) is the additive weight at node vv (Kriege et al., 2016, Kriege, 2019).

This formulation enables O(X+Y+V(T))O(|X| + |Y| + |V(T)|) kernel computation per pair, effectively linear in practice given the sparsity of TT. The same machinery underlies scalable assignment kernels such as the Weisfeiler–Lehman Optimal Assignment (WL-OA) and Pyramid Match Graph kernels.

3. Principal Assignment Kernels: WL-OA, Deep Assignment, Pyramid Match, and Wasserstein Variants

3.1 Weisfeiler–Lehman Optimal Assignment Kernel

The Weisfeiler–Lehman (WL) assignment kernel uses a base kernel reflecting the number of shared colors assigned to vertices during up to hh iterations of WL color refinement:

k0(u,v)=i=0h1[i(u)=i(v)]k_0(u, v) = \sum_{i=0}^h \mathbb{1}[\ell_i(u) = \ell_i(v)]

A hierarchy is built on color classes across rounds, and ω(v)=1\omega(v) = 1 for each class (Kriege et al., 2016, Nikolentzos et al., 2019). This strong kernel admits linear-time evaluation via histogram intersection, and outperforms convolutional WL kernels in discriminating fine-grained structural variations (see Table 1 below).

3.2 Deep Assignment Kernels via Multiple Kernel Learning

Weights ω(v)\omega(v) in the WL-induced hierarchy can be learned discriminatively using multiple kernel learning (MKL), yielding the Deep Weisfeiler–Lehman Assignment (DWL-OA) kernel. The kernel decomposes as

K(X,Y)=vV(T)ω(v)min{Xv,Yv}K(X, Y) = \sum_{v \in V(T)} \omega(v) \min\{|X_v|, |Y_v|\}

where ω(v)\omega(v) are optimized subject to ω(v)0\omega(v) \geq 0. Sparse solutions are routinely obtained, and nontrivial accuracy improvements over fixed-weight WL-OA were reported (Kriege, 2019).

3.3 Pyramid Match Graph Kernel

The Pyramid Match Graph kernel aligns point clouds (e.g., spectral vertex embeddings) using a hierarchy of spatial grids. For each quantization level ll, histograms are constructed, and incremental matches are counted via intersection. The final kernel aggregates matches at multiple resolutions:

kPM(G,G)=I(HGL,HGL)+l=0L1wlIlk_{\mathrm{PM}}(G, G') = I(H_G^L, H_{G'}^L) + \sum_{l=0}^{L-1} w_l I_l

with II denoting histogram intersection and wlw_l the level weight. This approach supports many-to-many alignments and is PSD by construction (Nikolentzos et al., 2019).

3.4 Assignment via Optimal Transport: Wasserstein WL Kernel

The Wasserstein Weisfeiler–Lehman (WWL) kernel generalizes assignment by formulating graph comparison as an optimal transport (OT) problem on the empirical node-feature distributions. For node embeddings xi,yjx_i, y_j with weights ai,bja_i, b_j:

W1(μG,μH)=minTR+n×mi,jTijd(xi,yj)W_1(\mu_G, \mu_H) = \min_{T \in \mathbb{R}_+^{n \times m}} \sum_{i, j} T_{ij} d(x_i, y_j)

subject to prescribed marginals. The kernel form is K(G,H)=exp(λW1(μG,μH))K(G, H) = \exp(-\lambda W_1(\mu_G, \mu_H)) (Togninalli et al., 2019). In the categorical case with Hamming cost, WWL is PD due to conditional negative-definiteness of W1W_1; the continuous case is not guaranteed PD, though empirical Gram matrices are often nearly so.

4. Expressivity, Consistency, and Structural Advantages

Assignment kernels provide greater expressivity than convolution kernels, as they align substructures one-to-one rather than summing over all pairs, capturing global correspondences and finer combinatorial variations (Nikolentzos et al., 2019, Kriege et al., 2016). For instance, graphs with identical substructure frequencies but differing arrangements are readily distinguished.

The WLOA kernel displays monotonicity and asymptotic order consistency: as the height hh of the WL refinement increases, the induced similarity ranking between pairs of graphs stabilizes, a property not shared by the WL-subtree kernel (Liu et al., 2024). This stability further motivates analogs for deep learning architectures, such as a layer-wise consistency loss for GNNs inspired by the WLOA’s ordering behavior.

Optimal transport–based assignment (e.g., WWL) allows for partial (mass-splitting) and continuous attribute alignment, integrating weighted and attributed graphs within a unified framework (Togninalli et al., 2019).

5. Computational Aspects and Scalability

The naive assignment kernel involves solving a bipartite matching (Hungarian algorithm) with O(n3)O(n^3) complexity for graphs of nn nodes. However, for strong base kernels, histogram-intersection based formulations reduce the complexity to effectively O(n)O(n) or O(hE)O(h|E|) for the WL-OA kernel and its deep and pyramid variants (Kriege et al., 2016, Nikolentzos et al., 2019, Kriege, 2019). The Pyramid Match kernel admits O(dnL)O(d n L) time for dd-dimensional embeddings at LL levels.

For OT-based kernels (WWL), exact network-simplex solvers run in O(n3logn)O(n^3 \log n) but entropic regularization via Sinkhorn iteration yields near-linear or O(n2/ϵ2)O(n^2/\epsilon^2) evaluation (Togninalli et al., 2019). Approximations such as node subsampling or feature quantization further accelerate large-scale computation.

6. Empirical Performance and Comparative Evaluation

Assignment kernels have been empirically validated on several graph classification benchmarks spanning chemical, biological, and social network domains. The WL-OA kernel consistently matches or outperforms the standard WL subtree (convolution) kernel, with pronounced gains on datasets emphasizing structural neighborhood alignment (Kriege et al., 2016, Nikolentzos et al., 2019). Deep WL-OA kernels with MKL-learned weights sometimes further enhance accuracy (Kriege, 2019).

The Pyramid Match kernel ranks among the top assignment-based methods for both labeled and unlabeled datasets (Nikolentzos et al., 2019). OT-based WWL kernels surpass prior state-of-the-art on several attributed graph problems, particularly when capturing distributional differences is key (Togninalli et al., 2019).

Extensive benchmarking establishes assignment kernels as a high-performing, scalable alternative to standard convolutional graph kernels. Notably, on node-attributed or real-valued feature graphs, GNN models (e.g., GIN, DiffPool) may outperform assignment kernels, but the latter remain strong baselines.

7. Extensions, Limitations, and Directions for Research

Assignment kernels admit various extensions:

  • Learnable weighting via MKL (hierarchy-structured DWL-OA kernels) enables model sparsity and task-focused discriminative power (Kriege, 2019).
  • WWL allows hybridization with GNN representations for node (sub-)embeddings (Togninalli et al., 2019).
  • Incorporation of cross-layer similarity consistency into GNN training objectives produces measurable improvements in classification performance and stability (Liu et al., 2024).

Noted limitations include:

  • Indefiniteness for non-strong or continuous base kernels, mandating either specialized SVMs (Kreĭn space solvers) or empirical PD corrections.
  • High computational cost for general OT assignment (especially without histogram shortcut).
  • On heavily attributed graphs, assignment kernels may lag behind learned neural architectures.

Active directions include more efficient OT solvers (e.g. Sinkhorn, sliced Wasserstein), development of explicit feature maps for OT-based distances, and extensions to accommodate edge and mixed-type attributes within the assignment framework (Togninalli et al., 2019). Efforts continue to bridge assignment kernel theory with neural message-passing, leveraging their structural stability properties.


Table: Computational and Theoretical Properties of Key Assignment Kernels

Kernel Positive Definite (PD) Guarantee Complexity (per pair)
WL-OA Yes, when base kernel is strong O(hE)O(h|E|)
Deep WL-OA (MKL) Yes, if weights non-negative (hierarchy-induced only) O(X+Y+V(T))O(|X| + |Y| + |V(T)|)
Pyramid Match Yes, via weighted histogram intersection O(dnL)O(dnL)
WWL (OT-based) Discrete: Yes; Continuous: Generally indefinite O(n3)O(n^3) (OT), O(n2/ϵ2)O(n^2/\epsilon^2) (Sinkhorn)

References: (Kriege et al., 2016, Nikolentzos et al., 2019, Togninalli et al., 2019, Kriege, 2019, Liu et al., 2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Assignment Kernels.