Partial Soft-Matching Distance in Optimal Transport

Updated 4 July 2026

Partial soft-matching distance is a method that relaxes strict one-to-one matching constraints by using soft correspondences and allowing some elements to remain unmatched.
It is applied across neural representation comparison, image–text retrieval, subgraph matching, and shape matching to handle noise and partial overlaps.
The approach bridges continuous optimal transport techniques with discrete matching models, providing insights into algorithm design and robustness against outliers.

Partial soft-matching distance denotes a family of relaxed correspondence objectives in which matching is soft because correspondences are represented by nonnegative couplings, soft correspondence matrices, or soft masks, and partial because some mass, fragments, nodes, units, or pairwise constraints may remain unmatched. The phrase is used explicitly for neural representational comparison in a partial optimal transport formulation (Kapoor et al., 22 Feb 2026), while image–text retrieval, subgraph matching, and partial shape matching use closely related constructions under different names, including entropic balanced OT over an augmented space with dustbins, partial fused Gromov–Wasserstein, and soft-masked geodesic consistency losses (Pan et al., 15 Mar 2026, Pan et al., 2024, Bracha et al., 2024). A complementary baseline is the hard injective partial-matching RMS model under translation, which fixes the opposite extreme of discrete, injective, unweighted correspondences and thereby clarifies what is changed when one passes to softer or more permissive formulations (Ben-Avraham et al., 2014).

1. Conceptual scope and recurrent formulations

Across the literature, there is no single canonical object called partial soft-matching distance. Instead, the term groups several mathematically distinct mechanisms that all relax exact one-to-one or full-mass matching. In neural representational comparison, the central object is a partial optimal transport distance over tuning curves. In image–text retrieval, the practical alignment object is a soft partial transport plan, but the final retrieval quantity is a similarity score induced by that plan, not a raw distance. In subgraph matching, the objective is a partial fused Gromov–Wasserstein discrepancy. In partial shape matching, the optimized object is often a soft correspondence matrix or a soft mask inside a loss rather than a metric in the strict sense (Khosla et al., 2023).

Setting	Soft component	Partial component
Neural representations	transportation polytope $T(N_x,N_y)$ or partial coupling $T$	only total mass $s$ is transported
Image–text retrieval	entropic OT / Sinkhorn coupling $\Gamma$	global “dustbin” nodes absorb irrelevant fragments
Subgraph matching	relaxed transport plan $\boldsymbol T$	$\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ , dummy-node slack
Partial shape matching	soft correspondence matrix $\boldsymbol P$ , soft mask $\boldsymbol M^s$	only guaranteed or softly weighted pairs contribute

A useful terminological boundary is that “soft” does not always mean entropic OT. In some papers, softness is the use of a transport coupling spread across many pairs; in others, it is a soft correspondence matrix, a soft mask, or a class-probability-based adjustment of a transport cost matrix. Likewise, “partial” may refer to fixed transported mass, relaxed marginal constraints, augmented dustbins, class reweighting, or selective pair inclusion. This suggests that the phrase is best treated as a comparative umbrella rather than a single standardized metric.

An older and distinct line of work in soft set theory critiques matrix-based matching-function similarity and replaces it with set-operations-based distances that explicitly compare parameter-set overlap $A\cap B$ , symmetric difference $A\Delta B$ , and value-set mismatch $T$ 0. That literature is not OT-based, but it is relevant because it also treats partial overlap as intrinsic to the distance definition rather than as missing data to be totalized (Kharal, 2010).

2. Hard injective partial matching as the opposite extreme

The clearest hard baseline is the partial-matching RMS model under translation. Let

$T$ 1

with $T$ 2. A partial matching is a maximum-cardinality matching of $T$ 3 into $T$ 4, that is, an injective assignment $T$ 5. For fixed point locations, the minimum partial-matching RMS objective is

$T$ 6

and under translation

$T$ 7

The model is explicitly a strict partial matching, not a soft one: every point of $T$ 8 must be assigned to exactly one distinct point of $T$ 9, each point of $s$ 0 can be used at most once, and there are no fractional correspondences, no probabilistic weights, no entropy or regularization terms, and no many-to-one assignments (Ben-Avraham et al., 2014).

For fixed $s$ 1, the translation dependence is

$s$ 2

with

$s$ 3

Hence the full objective $s$ 4 is the lower envelope of finitely many quadratics, equivalently of affine functions after subtracting the common term $s$ 5. The induced subdivision of translation space, $s$ 6, is convex; its faces are convex polygons; and multiple distinct matchings can be simultaneously optimal on an open set only if they match the same subset of $s$ 7. This geometric structure is a direct consequence of hard injectivity.

Several structural bounds make this baseline important. A line intersects the interior of at most $s$ 8 regions of $s$ 9. Every edge of $\Gamma$ 0 has a normal vector of the form $\Gamma$ 1. The number of unbounded regions is at most $\Gamma$ 2, every region has at most $\Gamma$ 3 edges, every vertex has degree at most $\Gamma$ 4, and any convex path intersects at most $\Gamma$ 5 regions. The global combinatorial complexity remains open, but the paper proves

$\Gamma$ 6

as an upper bound and

$\Gamma$ 7

as a lower bound on the number of regions.

Algorithmically, the same paper gives a polynomial-time algorithm for a local minimum,

$\Gamma$ 8

and an exact global algorithm obtained by traversing the full subdivision,

$\Gamma$ 9

with the same order again for computing the global minimum once the subdivision is constructed. This establishes the hard injective model as a rigorous comparison point: a soft model may reduce to it in a zero-temperature or hard-assignment limit, but it does not inherit its exact convex-subdivision structure.

3. From balanced soft matching to partial optimal transport

The modern OT-based notion of soft matching begins with the soft matching distance between neural representations. For activation matrices

$\boldsymbol T$ 0

with columns $\boldsymbol T$ 1, the transportation polytope is

$\boldsymbol T$ 2

and the soft matching distance is

$\boldsymbol T$ 3

This is balanced OT between the empirical measures

$\boldsymbol T$ 4

so it is a full soft matching: all neuronal mass is matched. The same work identifies it with a $\boldsymbol T$ 5-Wasserstein distance between empirical distributions and states that it is symmetric and satisfies the triangle inequality (Khosla et al., 2023).

Partial soft-matching distance in the explicit sense extends this balanced formulation by allowing only a prescribed fraction $\boldsymbol T$ 6 of total mass to be transported. The admissible partial couplings are

$\boldsymbol T$ 7

and the partial soft-matching distance is

$\boldsymbol T$ 8

with transport cost $\boldsymbol T$ 9. In practice, the paper states, “In our formulation, we use pairwise cosine distance as the cost function.” After mean-centering and unit-normalizing tuning curves, the same optimization can be written as maximizing matched correlation,

$\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 0

The row sums and column sums of the optimal plan quantify participation in the match, so near-zero sums identify effectively unmatched units (Kapoor et al., 22 Feb 2026).

A structurally analogous construction appears in subgraph matching via partial fused Gromov–Wasserstein. With masses $\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 1, cost matrix $\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 2, structure matrices $\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 3, and structural loss tensor

$\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 4

the partial transport set is

$\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 5

and the partial FGW objective for subgraph matching is

$\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 6

Here the matching is soft because $\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 7 is a fractional coupling rather than a discrete one-to-one assignment, and partial because only total mass $\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 8 is transported (Pan et al., 2024).

4. Dustbins, masks, and other realizations of partiality

A second major family realizes partiality not through explicit inequality marginals but through augmented state spaces, masks, or confidence mechanisms. In image–text retrieval, cross-modal matching is formulated as OT between local visual embeddings

$\mathcal{T}_s(\boldsymbol p,\boldsymbol q)$ 9

and local textual embeddings

$\boldsymbol P$ 0

with default uniform marginals $\boldsymbol P$ 1, $\boldsymbol P$ 2, cost matrix $\boldsymbol P$ 3 defined by

$\boldsymbol P$ 4

and entropic Sinkhorn objective

$\boldsymbol P$ 5

The partial mechanism is to add one global visual embedding $\boldsymbol P$ 6 and one global textual embedding $\boldsymbol P$ 7 as auxiliary “dustbins,” extend the transport problem from $\boldsymbol P$ 8 to $\boldsymbol P$ 9, and compute the final score only from the local-local block $\boldsymbol M^s$ 0, discarding dustbin assignments. The paper is explicit that the result is a soft partial transport plan, whereas the final retrieval quantity

$\boldsymbol M^s$ 1

is a similarity score induced by that plan, not a raw metric distance (Pan et al., 15 Mar 2026).

In partial domain adaptation, the central object is a soft-masked semi-dual OT distance between a target domain and a reweighted source domain. The source class mixture is reweighted by class-prior ratios

$\boldsymbol M^s$ 2

so that source-only classes are downweighted or ideally removed, and the pairwise transport cost is adjusted by a soft mask matrix

$\boldsymbol M^s$ 3

The matching is partial because transport is focused on the shared-label part of the source distribution, and soft because class compatibility is encoded through probability vectors rather than hard class assignments (Zhai et al., 3 May 2025).

In partial-to-partial 3D shape registration, the proposed Confidence Guided Distance

$\boldsymbol M^s$ 4

combines a soft feature similarity matrix $\boldsymbol M^s$ 5, nearest-neighbor Euclidean distances after a candidate rigid transform, and a confidence mechanism derived from column-normalized $\boldsymbol M^s$ 6,

$\boldsymbol M^s$ 7

This is not OT; rather, it is a feature-weighted Chamfer-style consensus score plus confidence-guided overlap sampling. Partiality is handled implicitly because non-overlapping points tend to have poor feature similarity and because transform hypotheses are generated from high-confidence points likely to lie in overlap (Ginzburg et al., 2022).

In partial shape matching on manifolds, the optimized object is again not a metric but a masked geodesic distance-preservation loss. The wormhole criterion defines a threshold matrix

$\boldsymbol M^s$ 8

a binary mask

$\boldsymbol M^s$ 9

and a soft relaxation

$A\cap B$ 0

The resulting Wormhole Loss uses a stochastic soft correspondence matrix $A\cap B$ 1 and weights only guaranteed-consistent or softly weighted pairs in

$A\cap B$ 2

Here partiality is pair selection, and softness comes both from $A\cap B$ 3 and from $A\cap B$ 4 (Bracha et al., 2024).

5. Metric status, optimization, and algorithmic structure

A central distinction is that “distance” is used nonuniformly across the literature. Balanced soft matching is a Wasserstein distance between empirical distributions and is stated to be symmetric and to satisfy the triangle inequality. Partial soft-matching distance in the partial OT sense is symmetric but, as the paper explicitly notes, partial OT distances do not satisfy the triangle inequality and therefore are not proper metrics. In subgraph matching, the partial FGW objective is safest to interpret as a partial graph discrepancy or relaxed matching objective rather than as a true metric, because the paper does not prove metric axioms for the partial subgraph version (Khosla et al., 2023, Kapoor et al., 22 Feb 2026).

Other constructions are even further from strict metric status. The cross-modal transport model in image–text retrieval originates from a transport cost, but the operational quantity for ranking and training is the transport-plan-weighted average cosine similarity. Confidence Guided Distance is a consensus score for rigid-registration hypotheses. Wormhole Loss is a masked geodesic-preservation energy. Soft-masked semi-dual OT in partial domain adaptation is a reweighed distance metric embedded inside an alternating end-to-end objective. A recurrent misconception is therefore to treat all partial soft-matching constructions as metrics; the literature supports a broader taxonomy that includes distances, discrepancies, similarities, and losses.

Optimization methods reflect this diversity. Hard injective partial matching under translation uses combinatorial geometry and minimum-cost injective bipartite matching. Balanced neural soft matching is a discrete OT linear program, solved in practice via the network simplex algorithm with stated complexity $A\cap B$ 5 when both representations have $A\cap B$ 6 units. The explicit partial neural extension uses partial OT, dummy or virtual points assigned a large transportation cost, and extracts row and column masses from a single solution to rank units, reducing subset-selection cost from brute-force $A\cap B$ 7 to a single $A\cap B$ 8 solve at a chosen transported mass $A\cap B$ 9 (Kapoor et al., 22 Feb 2026).

Graph-based partial FGW is nonconvex because of its quadratic structural term and is optimized by Frank–Wolfe on an augmented problem with a dummy node; the expensive tensor–matrix product is reduced from naive $A\Delta B$ 0 to $A\Delta B$ 1 in favorable cases. Image–text partial transport uses entropy-regularized OT and Sinkhorn/Bregman iterations, explicitly with a very small number of iterations $A\Delta B$ 2 in the reported implementation. Partial domain adaptation replaces full Sinkhorn with a semi-dual formulation optimized by gradient-based algorithms and a neural network approximation of the Kantorovich potential. These differences show that there is no single algorithmic signature of partial soft matching.

A further source of terminological ambiguity comes from neighboring areas. In 2-parameter persistence, the standard matching distance is

$A\Delta B$ 3

a supremum of weighted bottleneck distances over slices. This paper gives efficient approximation algorithms for the standard matching distance, but it does not define a partial soft variant; the only “partial matching” present there is the standard bottleneck partial matching between persistence diagrams (Kerber et al., 2019). This contrast is useful because it isolates a different sense of “matching distance” from the partial soft-matching literature.

6. Empirical roles and interpretive significance

The principal empirical role of partial soft matching is robust comparison under outliers, partial overlap, or heterogeneous relevance. In neural representational comparison, partial soft-matching preserves correct matches under added outliers, correctly selects the better model in noise-corrupted identification tasks, automatically excludes low-noise-ceiling voxels in fMRI, improves precision of cross-subject voxel alignment across visual areas, and yields unit rankings by alignment quality from a single transport solve. In deep networks, highly matched units exhibit similar maximally exciting images, while unmatched units show divergent patterns. Random orthogonal rotation reduces alignment even within the best-matched subpopulation, which the paper interprets as evidence for privileged axes (Kapoor et al., 22 Feb 2026).

In image–text retrieval, the practical motivation is redundant alignment: not every image fragment corresponds to some caption token, and vice versa. The reported ablations isolate the benefit of the partial mechanism itself: adding dustbins through local-to-global similarities improves over naïve OT, whereas simply appending global features does not. The paper further reports that analogous partial matching helps CAM, supporting the broader claim that redundant local alignments are a genuine issue in image–text matching (Pan et al., 15 Mar 2026).

In graph matching, partial transport is valuable because subgraph search should not force a full graph-to-graph match. The reported experiments show robustness to noisy query features and favorable query times, especially when the sliding-subgraph strategy SSOT restricts optimization to local candidate subgraphs. In 3D registration, the motivating difficulty is that some points in one cloud have no corresponding point in the other; the CGD formulation addresses this with learned feature compatibility, confidence-guided sampling, and a bidirectional nearest-neighbor score that outperforms Chamfer as a consensus metric under severe partiality, outliers, internal symmetries, and large rotations (Pan et al., 2024, Ginzburg et al., 2022).

Partial shape matching on manifolds uses the same broad idea at the level of trustworthy pairwise constraints. The wormhole criterion certifies many more consistent pairs than a prior boundary criterion, and the non-binary mask improves over binary masking. The reported effect is largest on datasets with more holes and stronger topological changes, where forcing all pairwise geodesic constraints to contribute is most harmful. In partial domain adaptation, soft-masked OT suppresses source outlier classes through class reweighting and class-probability-based masking; ablations identify the importance weights and mask mechanisms as the main contributors to reducing negative transfer (Bracha et al., 2024, Zhai et al., 3 May 2025).

Taken together, these works support a stable comparative picture. Hard injective partial matching, balanced soft matching, and partial soft matching are not interchangeable names for one method but distinct regimes of correspondence modeling. Hard injective models enforce discrete global consistency; balanced soft models allow fractional correspondences but still transport all mass; partial soft models additionally allow some mass, fragments, nodes, units, or pairwise constraints to remain unused. This suggests that the enduring significance of partial soft-matching distance lies less in any single formula than in a recurring design principle: robust comparison is often obtained by keeping correspondences soft while refusing to force universal participation.