Instance Affinity (IA) Score
- Instance Affinity (IA) score is a computed measure that quantifies the probability that two entities belong to the same instance within a scene.
- It is applied across 2D, 3D, and LiDAR segmentation tasks to enable proposal-free grouping and evaluate segmentation accuracy using learned similarity or probability outputs.
- The methodology involves various loss functions and clustering techniques that facilitate instance-level modeling in domain adaptation and image segmentation benchmarks.
An Instance Affinity (IA) score is a learned or computed quantity that quantifies the likelihood that two entities—pixels, voxels, columns (“pillars”), embeddings, or region proposals—belong to the same instance within a scene. IA scores are central to a wide array of instance segmentation, proposal-free grouping, and domain adaptation architectures across 2D, 3D, and domain adaptation settings. The precise definition, prediction, and usage of IA scores vary by context, but all share the common principle of facilitating fine-grained, sample- or region-level association for reliable instance-level modeling.
1. Core Definitions and Mathematical Formulations
Several mathematical formalizations for the Instance Affinity score exist, depending on the setting:
- Similarity-based (domain adaptation): For unsupervised domain adaptation, the IA score between feature embeddings is often defined as an unnormalized similarity:
where are deep feature vectors and encodes how close two samples (source or target) are in feature space (Sharma et al., 2021).
- Pixel/voxel/pillar affinity (segmentation): In grid-based segmentation, for two spatial locations (pixels, voxels, or LiDAR pillars), the IA score is typically a learned probability:
where is the sigmoid or softmax function applied to a neural network output (Liu et al., 2019, Gao et al., 2019, Chen et al., 2022, Liu et al., 2018).
- Instance-level recall (metric evaluation): In medical image segmentation, the IA score is used for evaluation, defined as the fraction of ground-truth instances matched (up to IoU threshold ) by at least one prediction:
where are ground-truth instances, are predictions, and is the indicator function (Wang et al., 28 Nov 2025).
This table summarizes main IA paradigms:
| Context | Definition (Summary) | Usage |
|---|---|---|
| Domain Adaptation (Sharma et al., 2021) | Feature-space similarity (via ) | Drives contrastive domain alignment |
| Proposal-free Segmentation (Liu et al., 2019, Gao et al., 2019, Liu et al., 2018) | Pixel/voxel/pillar pairwise probability | Graph structuring, clustering |
| Instance Segmentation Metric (Wang et al., 28 Nov 2025) | Fraction of GT instances recalled (IoU-based) | Evaluation of detection/segmentation |
2. Architectures and Methodologies Leveraging IA Scores
Domain Adaptation via Instance Affinity
In unsupervised domain adaptation, ILA-DA (Sharma et al., 2021) deploys an IA score as the central linkage between source and target domains. For each instance, the affinity is measured in deep feature space, enabling:
- Pseudo-labeling of target samples via k-nearest neighbors, according to .
- Construction of an affinity matrix encoding “same” (+1) or “different” (–1) label relation for source–target pairs.
- Filtering low-confidence associations by a ratio test on affinity-weighted neighborhoods.
- Optimization through a multi-sample contrastive loss that encourages intra-class feature cohesion and inter-class separation.
Proposal-free Instance Segmentation and Affinity Clustering
Several proposal-free segmentation methods rely on per-pixel, per-voxel, or per-pillar IA predictions:
- SSAP (Gao et al., 2019): Utilizes an affinity pyramid at multiple U-Net decoder scales; each pixel predicts a probability ( (0,1)) that it belongs to the same instance as its neighbors. These affinities are symmetrized and used in a three-stage, pyramid-graph minimum-cost multicut to yield instance clusters.
- 3D Sparse Convolution (MASC) (Liu et al., 2019): Predicts voxel-level affinities for six nearest neighbors across multiple resolutions; these are averaged across nodes and scales to define graph-based node affinities for iterative clustering, with instance merging governed by affinity thresholding.
- Pixel Affinity Graph Merge (Liu et al., 2018): Each pixel estimates affinity to 56 spatially distributed neighbors; instance masks are produced by greedy threshold-based graph merging, modulated by semantic similarity.
LiDAR Panoptic Segmentation
Proposal-free Lidar panoptic segmentation (Chen et al., 2022) derives pillar-level IA as the softmax probability that a pillar continues a prior instance vs. starts a new one. Clustering occurs through a local, memory-constrained propagation along scanlines, stiching “thing” class pillars into instance masks.
3. Supervision and Loss Functions for IA Score Learning
Supervision of IA predictions is framed as binary (or multi-class) classification, with losses applied at either the pixel pair, voxel pair, or region pair level:
- Binary Cross-Entropy: Each affinity prediction is trained with respect to a binary label (same/different instance), via
- Squared Loss (SSAP): L2 loss is used per-pixel per-level:
with imbalance handling via sample dropping and up-weighting mixed-label windows.
- Multi-Sample Contrastive Loss (ILA-DA): ILA-DA’s MSC loss encourages high affinity between features of the same class and low affinity otherwise, using positive/negative sampling as guided by the affinity matrix (Sharma et al., 2021).
- Lovász Loss: Used in LiDAR segmentation (Chen et al., 2022), the Lovász-softmax surrogate directly optimizes Jaccard-like metrics for IA scores.
4. Clustering, Partitioning, and Use of IA in Inference
The usage of IA scores during inference typically follows a clustering or partitioning phase:
- Graph Partitioning: Affinities become edge weights in a graph; graph partitioning methods (minimum-cost multicut [SSAP], greedy merging (Liu et al., 2018, Liu et al., 2019)) then segment the space into discrete instances, with hierarchy applied for efficiency and granularity (Gao et al., 2019).
- Sequential Propagation: In pillar-based LiDAR segmentation (Chen et al., 2022), sequential traversal and a local memory structure propagate instance IDs, using thresholded IA probabilities to decide instance continuation or new instance initiation.
- Semantic Modulation: Affinities can be refined with semantic similarity measures (e.g., via Jensen–Shannon divergence (Gao et al., 2019), or inner-product over semantic class probabilities (Liu et al., 2018)) to suppress unlikely associations and improve cluster purity.
- Instance-level Metrics: For evaluation, the IA score is computed as the instance-level recall (fraction of ground-truth instances with a prediction of IoU ≥ ), directly reflecting model effectiveness at finding all relevant objects (Wang et al., 28 Nov 2025).
5. Interplay with Other Instance Segmentation Metrics
Instance Affinity is both a learned association signal and a metric. Comparison with other evaluation measures highlights its specificity:
- IA score (segmentation metric) (Wang et al., 28 Nov 2025): Directly measures instance-level recall under IoU thresholding, unaffected by over-prediction (splitting) or merged predictions except as limited by best match; insensitive to shape refinement once IoU passes the threshold.
- Dice / IoU (standard): Sensitive to precise mask shape and boundary accuracy, but can obscure error modes such as merged/fragmented instances, which IA score detects.
- Average Precision (AP): Incorporates both detection and precision, but not always directly interpretable as the capacity to find all relevant instances in clinical or structural contexts.
These distinctions drive the adoption of IA as a primary metric in medical image instance segmentation challenges, especially where the correct separation of adjacent or overlapping structures is essential.
6. Empirical Effects and Results
Integration of IA scores into training and inference yields quantifiable improvements:
- Domain Adaptation: Adding the MSC loss driven by IA to DANN increases digit adaptation accuracy from 89.3% to 93.8%; for Office-31, from 93.1% to 94.9%; Birds-31, from 82.2% to 86.0% (Sharma et al., 2021).
- Segmentation Benchmarks: Instance-level architectures based on IA prediction outperform proposal-based baselines. SSAP achieves a 9% relative AP gain using its affinity pyramid and cascaded partitioning (Gao et al., 2019). MASC achieves [email protected] of 0.447 versus 0.382 for 3D-SIS (Liu et al., 2019).
- Medical Imaging: Top semi-supervised methods at STS 2024 increased IA score on OPG from ~44% (nnU-Net) to ~88.5% by incorporating detection-first pipelines, SAM-based refinements, and robust pseudo-labeling (Wang et al., 28 Nov 2025).
- Ablations: Removal or restriction of IA mechanisms leads to dramatic declines in instance grouping performance and slower inference (Gao et al., 2019, Liu et al., 2018).
7. Limitations, Considerations, and Evaluation
Key strengths of IA-based approaches include robust handling of instance separation (e.g., in presence of mergers and splits) and architectural simplicity (direct, proposal-free affinity learning). However, binary or hard-thresholded IA metrics are insensitive to incremental mask improvements once past the IoU cutoff and may not reflect precise shape quality (Wang et al., 28 Nov 2025). Moreover, final instance quality often depends on the integration of affinity, semantic, and global constraints; bottlenecks can shift from affinity prediction to semantic understanding as backbone architectures mature (Chen et al., 2022).
A plausible implication is that future methods may integrate soft/hierarchical affinity modeling with continuous instance quality measures and class-aware association signals, to further enhance both separability and discriminative segmentation quality.