Neural Assignment Matrix Prediction

Updated 5 April 2026

Neural assignment matrix prediction is a paradigm that leverages differentiable Sinkhorn layers and graph neural networks to learn and infer discrete assignment structures.
It transforms real-valued score matrices into soft doubly stochastic matrices through iterative normalization, approximating binary permutation matrices.
This approach achieves high accuracy in multi-object tracking, wireless resource allocation, and graph matching under diverse learning paradigms.

Neural assignment matrix prediction is a machine learning paradigm for learning and inferring discrete assignment structures—represented by permutation or assignment matrices—in combinatorial optimization and data association contexts. Rather than relying on hand-crafted costs or explicit combinatorial solvers, these frameworks use neural architectures, often incorporating differentiable approximations like Sinkhorn normalization and graph neural networks (GNNs), to predict assignment matrices directly or as distributions within a broader generative or decision-theoretic model. Applications include multi-object tracking, resource allocation in wireless networks, and graph/hypergraph matching for both classical linear and more general quadratic assignment problems.

1. Mathematical Formulation of Neural Assignment Matrix Prediction

Assignment prediction problems formalize the association between two sets, typically requiring a bijection or a partial bijection subject to constraints. For $N$ -way assignment, a binary permutation matrix $A \in \{0,1\}^{N \times N}$ encodes the assignment, with row and column sum constraints ensuring one-to-one correspondences. The set of all such matrices is denoted $\Pi_N$ . In multi-object tracking, given latent states $x_k \in \mathbb{R}^{N\times d}$ and observations $z_k \in \mathbb{R}^{N\times m}$ , the learning objective is to recover latent assignment sequences $A_{1:K}$ that best explain the observed data $Z$ under a Markovian state evolution and measurement model. Relaxations are often employed, using doubly stochastic matrices $P \in \mathcal{D}_N$ (the Birkhoff polytope) whose vertices are permutation matrices.

Assignment formulations can also address more general settings, such as the Lawler quadratic assignment problem (QAP), where one seeks $X \in \{0,1\}^{n_1\times n_2}$ to maximize a quadratic objective $\mathrm{vec}(X)^\top K \mathrm{vec}(X)$ under row and column constraints, or ranked assignment problems, which seek not just the optimal assignment but a sequence of the $A \in \{0,1\}^{N \times N}$ 0 best feasible assignments, sorted by cost (Wang et al., 2019, Burke et al., 2021, Dehler et al., 2 Apr 2026).

2. Sinkhorn Network Approaches and Differentiable Assignment Prediction

A central contribution to neural assignment matrix prediction is the application of differentiable Sinkhorn layers. Feedforward or graph-based neural networks output real-valued score matrices $A \in \{0,1\}^{N \times N}$ 1, which are transformed via iterative row and column normalization (Sinkhorn-Knopp algorithm) into (soft) doubly stochastic matrices: $A \in \{0,1\}^{N \times N}$ 2 where $A \in \{0,1\}^{N \times N}$ 3 and $A \in \{0,1\}^{N \times N}$ 4 enforce normalization and $A \in \{0,1\}^{N \times N}$ 5 is a temperature. For sufficiently low temperature and enough iterations, $A \in \{0,1\}^{N \times N}$ 6 approaches a permutation matrix.

This structure allows seamless integration into end-to-end learning: gradients propagate through the Sinkhorn normalization sequence, enabling backpropagation from assignment-based objectives to the underlying neural scoring network parameters. The mapping $A \in \{0,1\}^{N \times N}$ 7 can ingest raw measurements, features, or affinity matrices, and its parameters $A \in \{0,1\}^{N \times N}$ 8 are optimized by maximizing (regularized) likelihoods or by minimizing custom assignment losses (Burke et al., 2021, Kim et al., 2021, Wang et al., 2019).

3. Learning Paradigms: EM, Unsupervised, and End-to-End Strategies

Various learning paradigms are applied to neural assignment prediction, often reflecting the supervision (or lack thereof) and structure of the task:

Expectation-Maximization (EM)—Sinkhorn Models: By coupling a neural Sinkhorn predictor with an EM optimization, models maximize marginal likelihood over observed sequences. In the E-step, the posterior $A \in \{0,1\}^{N \times N}$ 9 over assignments is updated, typically via Kalman smoothers; in the M-step, network parameters maximize the expected complete data likelihood, e.g., minimizing $\Pi_N$ 0. This entire loop is differentiable via the Sinkhorn layer (Burke et al., 2021).
Unsupervised Direct Loss Minimization: Where ground-truth assignments are unavailable, models are trained to directly minimize the original assignment cost (e.g., linear sum assignment or utility function), plus, when using Sinkhorn, an implicit entropy regularization. The objective is

$\Pi_N$ 1

optimized by SGD (Kim et al., 2021).

End-to-End, Supervised Approaches: For structured matching problems (e.g., QAP), the architecture predicts assignment probabilities, enforcing row/column constraints via Sinkhorn, and optimizes via standard cross-entropy with respect to known matching labels (Wang et al., 2019). These approaches also extend to multiple-graph and hypergraph matching by introducing tensorized affinity structures and appropriately generalized message-passing and Sinkhorn operators.

4. Graph Neural Network Extensions and Hypergraph Matching

Sinkhorn architectures are further enhanced by graph neural network modules that encode relational structures among elements to be matched. In neural graph matching networks, QAP is recast as vertex classification in an association graph, where each vertex represents a potential ( $\Pi_N$ 2) correspondence, and edges/affinities encode pairwise similarities. Message-passing layers (GCN/GAT) aggregate neighborhood information. The initial and intermediate embeddings are periodically shaped by soft assignments output by the Sinkhorn layer, injecting one-to-one global assignment structure into the representation at every layer. Dummy node strategies accommodate unequal sizes.

These GNN approaches generalize to hypergraphs by using higher-order affinity tensors and appropriately generalized message-passing over association hypergraphs, with final predictions still normalized via generalized Sinkhorn procedures (Wang et al., 2019).

5. Practical Implementations in Multi-Object Tracking and Wireless Assignment

Neural assignment matrix prediction is particularly impactful in online and real-time settings where efficient, differentiable assignment is required:

Multi-Object Tracking: EM–Sinkhorn modules integrate as drop-in data association subroutines. At each frame, the neural predictor maps features/measurements to a soft assignment matrix, from which a hard assignment can be extracted via the Hungarian algorithm or further processed by gating and multi-hypothesis tracking methods (e.g., JPDAF). Empirical results show that EM–Sinkhorn-based learners attain near-optimal root mean squared error (RMSE) and identity accuracy, even in unsupervised settings (Burke et al., 2021).
Wireless Resource Assignment: Sinkhorn network architectures solve balanced assignment and joint association/power control problems in wireless networks. Results indicate $\Pi_N$ 3 cost degradation compared to the optimal Hungarian solution, with the computational complexity reduced from $\Pi_N$ 4 to $\Pi_N$ 5 per sample (Kim et al., 2021).
Ranked Assignment in MOT: The RAPNet architecture (GNN+LSTM) operates on bipartite assignment graphs to predict not only the optimal assignment but the top- $\Pi_N$ 6 solution sequence, matching the functional requirements of $\Pi_N$ 7-GLMB filters. RAPNet inference, when batched on GPU, achieves $\Pi_N$ 8 ms per graph for $\Pi_N$ 9, closing the gap to Murty’s algorithm with significant speedups and higher accuracy than Gibbs sampling for low to moderate $x_k \in \mathbb{R}^{N\times d}$ 0 and $x_k \in \mathbb{R}^{N\times d}$ 1 (Dehler et al., 2 Apr 2026).

Method	Downstream Domain	Key Metric/Result
EM–Sinkhorn (Burke et al., 2021)	Vision-based MOT	99.7% ID accuracy (unsupervised, N=4)
Sinkhorn NN (Kim et al., 2021)	Wireless assignment	$x_k \in \mathbb{R}^{N\times d}$ 2 overhead vs. Hungarian, O( $x_k \in \mathbb{R}^{N\times d}$ 3) comp.
RAPNet (Dehler et al., 2 Apr 2026)	Ranked MOT assignment (MOT/ $x_k \in \mathbb{R}^{N\times d}$ 4-GLMB)	0.99–0.95 Acc. rank 1–2; 0.02 ms/graph (batched)

6. Architectural and Algorithmic Limitations

Despite demonstrated effectiveness, these neural assignment predictors possess inherent limitations:

Soft Assignment vs. Hard Projection: While Sinkhorn normalization approaches permutation matrices asymptotically, finite temperature and iterations yield only approximately integral solutions, which sometimes necessitate post-processing (Hungarian or greedy repair) to extract binary assignments (Burke et al., 2021, Kim et al., 2021).
Batch Size and Scalability: RAPNet, for example, achieves significant batch inference accelerations on GPUs but retains overhead for small graphs due to GNN layers. For $x_k \in \mathbb{R}^{N\times d}$ 5, hand-coded combinatorial solvers may be faster (Dehler et al., 2 Apr 2026).
Ranked Assignment Limitations: RAPNet is trained for a fixed $x_k \in \mathbb{R}^{N\times d}$ 6, and greater $x_k \in \mathbb{R}^{N\times d}$ 7 requires retraining or additional, potentially less accurate, post-processing (Dehler et al., 2 Apr 2026).
Lack of Explicit Dynamics or Non-Gaussian Features: Most architectures consume cost matrices or summary statistics of measurement-to-track affinity, omitting richer dynamical or contextual features unless explicitly appended (Dehler et al., 2 Apr 2026).
Extension to High-Dimensional Matching (Multi-Frame/Multi-Sensor): Current methods mainly address 2D assignment. Generalizations to multi-dimensional association, e.g., 3D or 4D tensor cost assignments, remain an unresolved research frontier (Dehler et al., 2 Apr 2026).

7. Extensions, Outlook, and Research Directions

Neural assignment matrix prediction provides a modular, differentiable alternative to combinatorial solvers, naturally integrating into end-to-end learning pipelines. Ongoing research includes:

Integration of Advanced Feature Extraction: Incorporation of raw physical state, appearance, or sensor data.
Generalization to Many-to-Many and Multi-Frame Assignments: Extension of current architectures to support multi-sensor, multi-frame, or higher-order matching with tensorized cost structures.
Rank-Diversity Modeling: Training losses or architectural mechanisms for greater solution diversity within ranked predictions.
Complexity and Post-Processing Optimization: Lightweight and adaptive GNN architectures, pruning, or learning-augmented combinatorial solvers.
Applications to Hypergraph and Multi-Graph Matching: Deep learning representations for combinatorial generalizations beyond pairwise assignment with cycle-consistency and hyperedge constraints (Wang et al., 2019).

The empirical successes in diverse domains—achieving supervised-level performance even in unsupervised settings—highlight the practical relevance of these methods. Continued advances in scalable, structured neural architectures for assignment prediction will further expand their impact in large-scale, real-time, and combinatorially complex environments.