AffinityNet: Weak Supervision & Few-shot Learning

Updated 16 February 2026

The paper demonstrates that learning semantic pixel affinities using a modified ResNet-38 and random-walk propagation significantly boosts segmentation performance on benchmarks like PASCAL VOC and DeepGlobe.
It introduces a methodology that leverages affinity structures and kNN attention pooling to enhance label propagation and sample efficiency in high-dimensional, few-shot genomic applications.
Practical implementation details, including hyperparameters such as gamma=5, beta=8, and 256 iterations, are provided to ensure reproducibility and guide adaptation across diverse domains.

AffinityNet refers to a family of neural network architectures designed to model affinities, or pairwise similarities, between instances in various domains. The dominant instantiations of AffinityNet fall into two principal categories: (1) models that predict semantic pixel affinities for weakly supervised semantic segmentation in images, and (2) architectures for semi-supervised few-shot learning, particularly in genomic and structured data settings. Both lines of work leverage learned affinity structures to enable label propagation, regularization, and improved sample efficiency, but differ in network design, application, and theoretical motivation.

1. AffinityNet for Weakly Supervised Semantic Segmentation

The seminal AffinityNet, introduced by Ahn and Kwak for weakly supervised semantic segmentation, addresses the scarcity of pixel-level annotations by learning to predict class-agnostic semantic affinities between adjacent image locations using only image-level class labels as supervision (Ahn et al., 2018). The key innovation lies in leveraging these affinities to propagate sparse discriminative object responses across object regions via random walk, thereby generating pseudo-labels suitable for full segmentation network training.

Model Architecture

Backbone Feature Extractor: A modified ResNet-38 (pretrained on ImageNet) serves as the feature pipeline. The final three levels of residual blocks, originally with stride 2, are converted to atrous convolutions with dilation rates 1, 2, and 4, resulting in a feature map with overall stride 8.
Multi-level Feature Aggregation: Feature maps from the last three backbone stages ( $F_4$ , $F_5$ , $F_6$ ) are reduced in dimension (to 128, 256, 512 channels, respectively) with $1{\times}1$ convolutions, concatenated, and passed through an additional $1{\times}1$ convolution to yield a fused feature map $f^{\mathrm{aff}} \in \mathbb{R}^{H \times W \times 896}$ tailored to the affinity task.
Affinity Computation: For any two spatial locations $i, j$ within Euclidean radius $\gamma=5$ , semantic affinity is computed as

$W_{ij} = \exp\left(-\|f^{\mathrm{aff}}(x_i, y_i) - f^{\mathrm{aff}}(x_j, y_j)\|_1\right).$

This forms a sparse, symmetric affinity matrix $W$ with $W_{ii}=1$ .

2. Random Walk Propagation and Training Pipeline

To address incomplete object localization in seed maps generated from class activation maps (CAMs), AffinityNet employs random-walk propagation with the learned affinities:

Transition Matrix Construction: The random-walk transition matrix is

$T = D^{-1} W^{\odot \beta},$

where $\beta=8$ (Hadamard power) amplifies strong affinities and $D_{ii} = \sum_j W_{ij}^\beta$ .

Iterative Propagation: Given a vectorized CAM $M_c$ , the propagation proceeds via $M_c \leftarrow T M_c$ , iterated $t=256$ times. The resulting $M^*_c = T^t M_c$ constitutes the refined CAM.

Supervision for Affinity Prediction is derived without pixel-level labels:

CAMs for ground-truth classes are produced, normalized, and refined with dense CRF.
"Confident foreground," "confident background," and "neutral" regions are determined by varying thresholds ( $\alpha=4, 16, 24$ ).
Pixel pairs within confident regions are labeled as positive (same label), negative (different labels), or ignored (involving neutral regions).
The cross-entropy loss over positive and negative pairs enforces boundary sensitivity.

Training and Deployment: After AffinityNet is trained, the propagated pseudo-labels serve as synthetic ground-truth for standard segmentation network training.

3. AffinityNet for Semi-supervised Few-shot Learning

A parallel instantiation, introduced by Zhang et al. (Ma et al., 2018), targets "big p, small N" regimes in high-dimensional data such as cancer genomics. Here, AffinityNet is characterized by k-Nearest-Neighbor (kNN) attention pooling layers, which can act as plug-in modules in arbitrary neural architectures.

Model Structure

Input: $X \in \mathbb{R}^{N \times p}$ (samples $\times$ features).
Feature Attention Layer: Weighted re-scaling selects informative features, implemented as $h'_i = w \odot h_i$ with $w \in \mathbb{R}^p$ learned, $w_j \geq 0$ , $\sum_j w_j = 1$ .
Stacked kNN Attention Pooling: At each layer $\ell$ , for every sample $i$ ,

$h'^{(\ell)}_i = f\left(\sum_{j \in N(i)} a_{ij} h_j^{(\ell-1)}\right),$

where $N(i)$ are the $k$ -nearest neighbors, $a_{ij}$ is the normalized attention (softmax over similarity scores), and $f$ comprises affine transform and nonlinearity.

Affinity Computation and Label Propagation: The kNN attention acts as an implicit regularizer by enforcing neighborhood consistency in the learned representation.

Training: Supervised cross-entropy is applied to labeled data, while unlabeled samples influence representations via neighbor pooling, enabling semi-supervised and few-shot learning. No explicit regularizer is used.

4. Experimental Results and Quantitative Performance

Semantic Segmentation

On PASCAL VOC 2012:
- CAM alone: 48.0% mIoU (train set)
- CAM + AffinityNet random-walk: 58.1%
- + dCRF: 59.7%
- Final segmentation, ResNet-38 backbone: 61.7% (val) / 63.7% (test), surpassing all prior image-level supervised methods (previous best ≈ 55%) (Ahn et al., 2018).
On DeepGlobe Land Cover (satellite images) (Nivaggioli et al., 2019):
- Weakly supervised AffinityNet + random walk: mIoU 45.90% (no background), within 7–8 points of fully supervised top entries (53.58%).
- Off-the-shelf segmentation nets, trained on AffinityNet labels, exhibited negligible degradation relative to full supervision.

Few-shot and Genomics

Synthetic 4-cluster data (p=42, 1% labels): AffinityNet accuracy 98.2% vs. 46.9% for plain neural net.
TCGA kidney cancer, 1% labeled: AffinityNet AMI 0.84 vs. 0.70 (NN/SVM).
Survival analysis (Cox model): c-index ~0.69–0.73 for AffinityNet features (Ma et al., 2018).

5. Implementation Details and Hyperparameters

Semantic Segmentation (Image Domain)

Backbone: ResNet-38/ResNet-74 with atrous convolutions and feature aggregation.
Affinity radius: $\gamma=5$ (patch-wise for satellite, up to 10).
Random-walk: $\beta=8$ , $t=256$ iterations (can use repeated squaring for efficiency).
Loss: Weighted sum of cross-entropies for foreground, background, and negative pairs. Emphasizing background affinity in remote sensing ( $a,b,c=6,2,3$ ).
Training: Adam optimizer; typical convergence: ~30 epochs (CAM), ~7 (AffinityNet), ~32 (segmentation net).
Data Augmentation: Horizontal flips, random crops, color jitter; random scaling for segmentation, not AffinityNet.

kNN Attention (Tabular Domain)

Attention kernel: Cosine similarity (for cancer data), alternatives supported.
$k$ : 2–3 in genomics, general range 2–5% of $N$ .
Hidden dimensions: $d=100$ , layers: $L=1$ (kidney), $L=2$ (uterus).
Batch size: All $N$ or mini-batches; neighbor sets constructed per batch if needed.

6. Strengths, Limitations, and Generalizations

Strengths

Enables propagation from sparse seed activations to dense, high-quality pseudo-labels using only image-level supervision; achieves segmentation results near those of full supervision (Ahn et al., 2018, Nivaggioli et al., 2019).
Affinity structures encode boundary sensitivity and improve label consistency without direct pixelwise annotations.
kNN attention pooling in tabular AffinityNet provides regularization and sample efficiency, facilitating few-shot learning and semi-supervised clustering (Ma et al., 2018).
Plug-and-play kNN modules are flexible, generalize beyond graph data, and can replace normalization/pooling in arbitrary pipelines.

Limitations

Naive affinity/random-walk computation is $\mathcal{O}(n^2)$ . Efficient sparse or approximate implementations are needed for large $N$ or dense segmentation maps.
Patch-wise prediction in large images can yield boundary artifacts.
The hyperparameters ( $\gamma$ , $\beta$ , $k$ ) and attention kernel must be domain-tuned; oversmoothing is possible if class structure is unclear.
For weakly supervised segmentation, minor/rare classes may be under-represented in CAMs, limiting affinity label diversity.

Generalizations

AffinityNet's affinity prediction paradigm can be directly extended to multi-spectral or SAR remote sensing data by adapting the backbone for additional channels (Nivaggioli et al., 2019).
Multi-scale affinity computation and graph-cut regularization are viable enhancements.
Combination with weak forms of supervision (scribbles, points) could further densify and regularize pseudo-label propagation.

7. Relation to Prior Work and Impact

AffinityNet's core contributions include an end-to-end learnable pixel-affinity predictor and a random-walk label propagation framework, together providing a principled and empirically validated approach to weakly supervised segmentation (Ahn et al., 2018). The extension to kNN attention for few-shot learning connects to and generalizes the Graph Attention Model (GAM) (Ma et al., 2018), removing the requirement of a fixed graph and increasing applicability across domains. AffinityNet has demonstrated marked gains in low-label, high-dimensional regimes (cancer genomics), providing a template for future affinity-based regularization and label propagation models.

Markdown Upgrade to Chat

References (3)

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (2018)

AffinityNet: semi-supervised few-shot learning for disease type prediction (2018)

Weakly Supervised Semantic Segmentation of Satellite Images (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AffinityNet Model.

AffinityNet: Weak Supervision & Few-shot Learning

1. AffinityNet for Weakly Supervised Semantic Segmentation

Model Architecture

2. Random Walk Propagation and Training Pipeline

3. AffinityNet for Semi-supervised Few-shot Learning

Model Structure

4. Experimental Results and Quantitative Performance

Semantic Segmentation

Few-shot and Genomics

5. Implementation Details and Hyperparameters

Semantic Segmentation (Image Domain)

kNN Attention (Tabular Domain)

6. Strengths, Limitations, and Generalizations

Strengths

Limitations

Generalizations

7. Relation to Prior Work and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

AffinityNet: Weak Supervision & Few-shot Learning

1. AffinityNet for Weakly Supervised Semantic Segmentation

Model Architecture

2. Random Walk Propagation and Training Pipeline

3. AffinityNet for Semi-supervised Few-shot Learning

Model Structure

4. Experimental Results and Quantitative Performance

Semantic Segmentation

Few-shot and Genomics

5. Implementation Details and Hyperparameters

Semantic Segmentation (Image Domain)

kNN Attention (Tabular Domain)

6. Strengths, Limitations, and Generalizations

Strengths

Limitations

Generalizations

7. Relation to Prior Work and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research