Assignment Kernels for Structured Data

Updated 23 February 2026

Assignment kernels are similarity measures that optimally align substructures in composite objects through bijections or optimal transport, capturing both quality and arrangement.
They leverage hierarchical representations and histogram intersections to ensure positive-definite forms and near-linear computation in practical scenarios.
These kernels outperform standard convolution methods by preserving global correspondences and supporting extensions such as MKL and deep graph integrations.

Assignment kernels are a prominent family of structured data kernels designed to measure similarity between composite objects—such as graphs, sets, or sequences—by optimally aligning their constituent parts. In contrast to $\mathcal{R}$ -convolution kernels, which aggregate similarities over all pairs of parts, assignment kernels maximize the overall correspondence by finding an optimal bijection or, more generally, an optimal transport plan between the parts. This one-to-one or mass-splitting alignment provides a notion of similarity that is sensitive both to the quality and arrangement of substructures. Assignment kernels include classical bijective formulations based on combinatorial optimization, as well as modern extensions via optimal transport (OT) that interpolate between rigid matching and soft mass redistribution; both admit efficient positive-definite forms under appropriate conditions.

1. Mathematical Formulation and Theoretical Foundations

Let $\mathcal{X}$ denote a domain of atomic parts (e.g., vertices, subtrees). Given two multisets $X, Y \subseteq \mathcal{X}$ with $|X| = |Y| = n$ , assignment kernels are defined by choosing a bijection $B$ between $X$ and $Y$ that maximizes the total similarity according to a base kernel $k_0: \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ :

$K^{k_0}(X, Y) = \max_{B \in \mathfrak{B}(X, Y)} \sum_{(x, y) \in B} k_0(x, y)$

where $\mathfrak{B}(X, Y)$ denotes the set of all bijections $X \rightarrow Y$ (Nikolentzos et al., 2019, Kriege et al., 2016).

A key property is positive-definiteness (PD), which is not guaranteed for arbitrary base kernels $k_0$ . A necessary and sufficient condition is that $k_0$ be a strong kernel, characterized by the condition

$k_0(x, y) \geq \min\{k_0(x, z), k_0(z, y)\} \quad \forall x, y, z \in \mathcal{X}$

Strong kernels can be equivalently realized via hierarchies: there exists a rooted tree $T$ on $\mathcal{X}$ (leaves as elements, non-decreasing weights $w$ toward the root), with $k_0(x, y) = w(\mathrm{LCA}(x, y))$ (Kriege et al., 2016). With a strong kernel, the assignment kernel is guaranteed PD and admits linear-time computation by histogram intersection.

For unequal set sizes, dummy elements with zero similarity can be padded to smaller sets (Nikolentzos et al., 2019).

2. Efficient Computation via Hierarchies and Histogram Intersection

Given the equivalence between strong base kernels and tree-induced hierarchies, the assignment kernel’s optimal bijection can be computed by a weighted histogram intersection over the hierarchy. For $X, Y \subseteq \mathcal{X}$ and every node $v$ in $T$ ,

$H^{k_0}(X)_v = \omega(v) |X_v|,\quad K^{k_0}(X, Y) = \sum_{v \in V(T)} \min\{|X_v|, |Y_v|\} \omega(v)$

where $X_v$ is the set of parts in $X$ mapped under $v$ , and $\omega(v)$ is the additive weight at node $v$ (Kriege et al., 2016, Kriege, 2019).

This formulation enables $O(|X| + |Y| + |V(T)|)$ kernel computation per pair, effectively linear in practice given the sparsity of $T$ . The same machinery underlies scalable assignment kernels such as the Weisfeiler–Lehman Optimal Assignment (WL-OA) and Pyramid Match Graph kernels.

3. Principal Assignment Kernels: WL-OA, Deep Assignment, Pyramid Match, and Wasserstein Variants

3.1 Weisfeiler–Lehman Optimal Assignment Kernel

The Weisfeiler–Lehman (WL) assignment kernel uses a base kernel reflecting the number of shared colors assigned to vertices during up to $h$ iterations of WL color refinement:

$k_0(u, v) = \sum_{i=0}^h \mathbb{1}[\ell_i(u) = \ell_i(v)]$

A hierarchy is built on color classes across rounds, and $\omega(v) = 1$ for each class (Kriege et al., 2016, Nikolentzos et al., 2019). This strong kernel admits linear-time evaluation via histogram intersection, and outperforms convolutional WL kernels in discriminating fine-grained structural variations (see Table 1 below).

3.2 Deep Assignment Kernels via Multiple Kernel Learning

Weights $\omega(v)$ in the WL-induced hierarchy can be learned discriminatively using multiple kernel learning (MKL), yielding the Deep Weisfeiler–Lehman Assignment (DWL-OA) kernel. The kernel decomposes as

$K(X, Y) = \sum_{v \in V(T)} \omega(v) \min\{|X_v|, |Y_v|\}$

where $\omega(v)$ are optimized subject to $\omega(v) \geq 0$ . Sparse solutions are routinely obtained, and nontrivial accuracy improvements over fixed-weight WL-OA were reported (Kriege, 2019).

3.3 Pyramid Match Graph Kernel

The Pyramid Match Graph kernel aligns point clouds (e.g., spectral vertex embeddings) using a hierarchy of spatial grids. For each quantization level $l$ , histograms are constructed, and incremental matches are counted via intersection. The final kernel aggregates matches at multiple resolutions:

$k_{\mathrm{PM}}(G, G') = I(H_G^L, H_{G'}^L) + \sum_{l=0}^{L-1} w_l I_l$

with $I$ denoting histogram intersection and $w_l$ the level weight. This approach supports many-to-many alignments and is PSD by construction (Nikolentzos et al., 2019).

3.4 Assignment via Optimal Transport: Wasserstein WL Kernel

The Wasserstein Weisfeiler–Lehman (WWL) kernel generalizes assignment by formulating graph comparison as an optimal transport (OT) problem on the empirical node-feature distributions. For node embeddings $x_i, y_j$ with weights $a_i, b_j$ :

$W_1(\mu_G, \mu_H) = \min_{T \in \mathbb{R}_+^{n \times m}} \sum_{i, j} T_{ij} d(x_i, y_j)$

subject to prescribed marginals. The kernel form is $K(G, H) = \exp(-\lambda W_1(\mu_G, \mu_H))$ (Togninalli et al., 2019). In the categorical case with Hamming cost, WWL is PD due to conditional negative-definiteness of $W_1$ ; the continuous case is not guaranteed PD, though empirical Gram matrices are often nearly so.

4. Expressivity, Consistency, and Structural Advantages

Assignment kernels provide greater expressivity than convolution kernels, as they align substructures one-to-one rather than summing over all pairs, capturing global correspondences and finer combinatorial variations (Nikolentzos et al., 2019, Kriege et al., 2016). For instance, graphs with identical substructure frequencies but differing arrangements are readily distinguished.

The WLOA kernel displays monotonicity and asymptotic order consistency: as the height $h$ of the WL refinement increases, the induced similarity ranking between pairs of graphs stabilizes, a property not shared by the WL-subtree kernel (Liu et al., 2024). This stability further motivates analogs for deep learning architectures, such as a layer-wise consistency loss for GNNs inspired by the WLOA’s ordering behavior.

Optimal transport–based assignment (e.g., WWL) allows for partial (mass-splitting) and continuous attribute alignment, integrating weighted and attributed graphs within a unified framework (Togninalli et al., 2019).

5. Computational Aspects and Scalability

The naive assignment kernel involves solving a bipartite matching (Hungarian algorithm) with $O(n^3)$ complexity for graphs of $n$ nodes. However, for strong base kernels, histogram-intersection based formulations reduce the complexity to effectively $O(n)$ or $O(h|E|)$ for the WL-OA kernel and its deep and pyramid variants (Kriege et al., 2016, Nikolentzos et al., 2019, Kriege, 2019). The Pyramid Match kernel admits $O(d n L)$ time for $d$ -dimensional embeddings at $L$ levels.

For OT-based kernels (WWL), exact network-simplex solvers run in $O(n^3 \log n)$ but entropic regularization via Sinkhorn iteration yields near-linear or $O(n^2/\epsilon^2)$ evaluation (Togninalli et al., 2019). Approximations such as node subsampling or feature quantization further accelerate large-scale computation.

6. Empirical Performance and Comparative Evaluation

Assignment kernels have been empirically validated on several graph classification benchmarks spanning chemical, biological, and social network domains. The WL-OA kernel consistently matches or outperforms the standard WL subtree (convolution) kernel, with pronounced gains on datasets emphasizing structural neighborhood alignment (Kriege et al., 2016, Nikolentzos et al., 2019). Deep WL-OA kernels with MKL-learned weights sometimes further enhance accuracy (Kriege, 2019).

The Pyramid Match kernel ranks among the top assignment-based methods for both labeled and unlabeled datasets (Nikolentzos et al., 2019). OT-based WWL kernels surpass prior state-of-the-art on several attributed graph problems, particularly when capturing distributional differences is key (Togninalli et al., 2019).

Extensive benchmarking establishes assignment kernels as a high-performing, scalable alternative to standard convolutional graph kernels. Notably, on node-attributed or real-valued feature graphs, GNN models (e.g., GIN, DiffPool) may outperform assignment kernels, but the latter remain strong baselines.

7. Extensions, Limitations, and Directions for Research

Assignment kernels admit various extensions:

Learnable weighting via MKL (hierarchy-structured DWL-OA kernels) enables model sparsity and task-focused discriminative power (Kriege, 2019).
WWL allows hybridization with GNN representations for node (sub-)embeddings (Togninalli et al., 2019).
Incorporation of cross-layer similarity consistency into GNN training objectives produces measurable improvements in classification performance and stability (Liu et al., 2024).

Noted limitations include:

Indefiniteness for non-strong or continuous base kernels, mandating either specialized SVMs (Kreĭn space solvers) or empirical PD corrections.
High computational cost for general OT assignment (especially without histogram shortcut).
On heavily attributed graphs, assignment kernels may lag behind learned neural architectures.

Active directions include more efficient OT solvers (e.g. Sinkhorn, sliced Wasserstein), development of explicit feature maps for OT-based distances, and extensions to accommodate edge and mixed-type attributes within the assignment framework (Togninalli et al., 2019). Efforts continue to bridge assignment kernel theory with neural message-passing, leveraging their structural stability properties.

Table: Computational and Theoretical Properties of Key Assignment Kernels

Kernel	Positive Definite (PD) Guarantee	Complexity (per pair)
WL-OA	Yes, when base kernel is strong	$O(h\|E\|)$
Deep WL-OA (MKL)	Yes, if weights non-negative (hierarchy-induced only)	$O(\|X\| + \|Y\| + \|V(T)\|)$
Pyramid Match	Yes, via weighted histogram intersection	$O(dnL)$
WWL (OT-based)	Discrete: Yes; Continuous: Generally indefinite	$O(n^3)$ (OT), $O(n^2/\epsilon^2)$ (Sinkhorn)

References: (Kriege et al., 2016, Nikolentzos et al., 2019, Togninalli et al., 2019, Kriege, 2019, Liu et al., 2024)

Markdown Report Issue Upgrade to Chat

References (5)

Graph Kernels: A Survey (2019)

On Valid Optimal Assignment Kernels and Applications to Graph Classification (2016)

Deep Weisfeiler-Lehman Assignment Kernels via Multiple Kernel Learning (2019)

Wasserstein Weisfeiler-Lehman Graph Kernels (2019)

Exploring Consistency in Graph Representations:from Graph Kernels to Graph Neural Networks (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Assignment Kernels.

Assignment Kernels for Structured Data

1. Mathematical Formulation and Theoretical Foundations

2. Efficient Computation via Hierarchies and Histogram Intersection

3. Principal Assignment Kernels: WL-OA, Deep Assignment, Pyramid Match, and Wasserstein Variants

3.1 Weisfeiler–Lehman Optimal Assignment Kernel

3.2 Deep Assignment Kernels via Multiple Kernel Learning

3.3 Pyramid Match Graph Kernel

3.4 Assignment via Optimal Transport: Wasserstein WL Kernel

4. Expressivity, Consistency, and Structural Advantages

5. Computational Aspects and Scalability

6. Empirical Performance and Comparative Evaluation

7. Extensions, Limitations, and Directions for Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Assignment Kernels for Structured Data

1. Mathematical Formulation and Theoretical Foundations

2. Efficient Computation via Hierarchies and Histogram Intersection

3. Principal Assignment Kernels: WL-OA, Deep Assignment, Pyramid Match, and Wasserstein Variants

3.1 Weisfeiler–Lehman Optimal Assignment Kernel

3.2 Deep Assignment Kernels via Multiple Kernel Learning

3.3 Pyramid Match Graph Kernel

3.4 Assignment via Optimal Transport: Wasserstein WL Kernel

4. Expressivity, Consistency, and Structural Advantages

5. Computational Aspects and Scalability

6. Empirical Performance and Comparative Evaluation

7. Extensions, Limitations, and Directions for Research

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research