Affinity Network Fusion (ANF)

Updated 12 April 2026

ANF is a framework that combines heterogeneous multi-omic data into a unified patient similarity representation to address clustering challenges in complex diseases.
It constructs per-view affinity networks using locally-scaled Gaussian kernels, fusing them via weighted averaging and random-walk operations for spectral clustering and few-shot classification.
Empirical results show that ANF offers faster computation, improved clustering accuracy, and enhanced interpretability compared to Similarity Network Fusion in cancer subtype discovery.

Affinity Network Fusion (ANF) is a principled framework for integrating heterogeneous multi-omic data into a unified patient similarity representation. ANF was developed to address the challenges of clustering and subtype discovery in complex diseases such as cancer, where each data modality (e.g., gene expression, miRNA, methylation) provides a distinct and noisy view of the underlying biological heterogeneity. The method constructs per-view affinity networks, fuses these into a single row-stochastic affinity matrix via random-walk-based operations and weighted averaging, and enables both unsupervised spectral clustering and few-shot semi-supervised classification. ANF directly generalizes and improves upon Similarity Network Fusion (SNF), offering faster computation, interpretability, and support for per-view weighting while matching or improving clustering accuracy (Ma et al., 2018, Ma et al., 2017).

1. Mathematical Formulation of ANF

Given $n$ omic views over $N$ samples, each dataset is represented by a feature matrix $X^{(v)} \in \mathbb{R}^{N \times p_v}$ for view $v$ . ANF maps these heterogeneous inputs into a unified manifold in several steps:

a. Construction of Per-View Affinity Networks

For each view $v$ , compute the pairwise distance matrix $\Delta^{(v)}$ with entries $\delta_{ij}^{(v)}$ (typically Euclidean or correlation distances).
Define local scaling for each sample $i$ :

$\mu_i = \frac{1}{k} \sum_{l \in \mathcal{N}_k(i)} \delta_{il}^{(v)}$

where $\mathcal{N}_k(i)$ is the $N$ 0-nearest neighborhood.

Compute local variance:

$N$ 1

Construct the affinity matrix using a locally-scaled Gaussian kernel:

$N$ 2

Normalize rows to produce a transition (row-stochastic) matrix:

$N$ 3

Apply $N$ 4-nearest neighbor truncation with sparsification parameter $N$ 5 to yield $N$ 6 as follows:

$N$ 7

Typically, $N$ 8.

b. Affinity Network Fusion

Let $N$ 9 be the per-view row-stochastic affinity matrices; choose non-negative weights $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 0 such that $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 1.
The simplest fusion computes

$X^{(v)} \in \mathbb{R}^{N \times p_v}$ 2

Optionally, perform an $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 3-step random walk on $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 4: $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 5 ( $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 6 or $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 7); $X^{(v)} \in \mathbb{R}^{N \times p_v}$ 8 degrades clustering structure.
Alternatively, perform a single cross-view smoothing step:

$X^{(v)} \in \mathbb{R}^{N \times p_v}$ 9

where $v$ 0 is the weighted average of all other views.

c. Spectral Clustering

Construct the (symmetric) graph Laplacian:

$v$ 1

where $v$ 2.

Solve the relaxed normalized cut problem:

$v$ 3

The optimal $v$ 4 contains the $v$ 5 eigenvectors corresponding to the lowest eigenvalues.

Apply $v$ 6-means clustering to rows of $v$ 7; the eigengap heuristic determines a suitable $v$ 8.

2. Comparison to Similarity Network Fusion (SNF)

ANF generalizes and simplifies the iterative similarity diffusion in SNF [Wang et al., 2014]. While SNF updates each symmetric similarity matrix via multiple cross-diffusions with all other views until convergence (typically requiring $v$ 9 iterations) and employs an ad-hoc diagonal fix, ANF achieves comparable or better results with a single (or at most two) random walk or mixing steps, directly operating on row-stochastic transition matrices (Ma et al., 2018, Ma et al., 2017). ANF also supports arbitrary (nonuniform) view weights, eschews repeated iterative updates, and avoids the need for SNF's symmetry-enforcing heuristics. In empirical studies, ANF reduces computational time by at least half compared to one SNF iteration and obviates the need for iterative convergence.

3. Algorithmic Workflow and Computational Considerations

The main steps of ANF are as follows (Ma et al., 2017):

Feature selection and transformation for each omic view, using differential expression or variance filtering and transformations such as log, PCA, or variance-stabilizing transforms.
For each view:
- Compute the pairwise distance matrix $v$ 0.
- Build a locally-scaled Gaussian affinity $v$ 1 and normalize.
- Prune weak edges to obtain a $v$ 2-NN sparse transition matrix $v$ 3.
Fuse the per-view networks using weighted averaging and (optionally) one- or two-step random walks.
Perform spectral clustering on the fused network as described above.

Computational complexity is dominated by pairwise distance calculations ( $v$ 4), sparse matrix multiplications for fusion ( $v$ 5), and Laplacian eigen-decomposition ( $v$ 6 for exact computation, practical for $v$ 7).

4. Semi-supervised Extension: Few-shot Learning via ANF

The fusion framework facilitates downstream semi-supervised classification by enabling few-shot neural classification over the fused representation. The concatenated rows of $v$ 8 (or the fused $v$ 9) for each patient are presented as input to a compact feed-forward network with ReLU activation and softmax output. For patient $\Delta^{(v)}$ 0, the network model is:

$\Delta^{(v)}$ 1

$\Delta^{(v)}$ 2

$\Delta^{(v)}$ 3

$\Delta^{(v)}$ 4

where $\Delta^{(v)}$ 5 is the number of clusters/classes. Optimization minimizes the cross-entropy loss over a small, possibly noisy set of labeled examples, often with Adam (learning rate $\Delta^{(v)}$ 6, decay $\Delta^{(v)}$ 7) for up to 100 epochs. Due to strong structure imparted by the kNN-based network, fewer than $\Delta^{(v)}$ 8 of samples suffice to achieve $\Delta^{(v)}$ 9 test accuracy in some settings (Ma et al., 2018).

5. Experimental Validation in Cancer Patient Clustering

ANF has been applied to a harmonized GDC/TCGA dataset of 2,193 patients across four primary tumor types (adrenal, lung, kidney, uterus) and nine disease-type subgroups (Ma et al., 2018, Ma et al., 2017). Each patient is characterized by three data views: RNA-seq (FPKM), miRNA counts, and Illumina 450K methylation β-values.

Key results include:

Cancer	#Subtypes	NMI	ARI
Adrenal	2	0.96	0.98
Lung	2	0.75	0.83
Kidney	3	0.84	0.91
Uterus	2	0.61	0.78

Integrating at least two omic views using ANF consistently outperforms single-view clustering both in NMI and ARI. Eigengap analysis on $\delta_{ij}^{(v)}$ 0 reliably recovers the correct cluster number. In subtype discovery, ANF split one major subtype (PCPG in adrenal) into two further subgroups, indicating the potential for subtype refinement.

In semi-supervised mode, training on as few as 2–10 clean labels per cancer (i.e., $\delta_{ij}^{(v)}$ 1 of data) yields test NMI $\delta_{ij}^{(v)}$ 2 (accuracy $\delta_{ij}^{(v)}$ 3). Fine-tuning for unrepresented subtypes recovers all classes rapidly.

6. Parameter Selection and Feature Engineering

Performance is modulated by several hyperparameters:

Neighborhood size $\delta_{ij}^{(v)}$ 4 (typical $\delta_{ij}^{(v)}$ 5– $\delta_{ij}^{(v)}$ 6 or $\delta_{ij}^{(v)}$ 7).
Kernel weights $\delta_{ij}^{(v)}$ 8, with the default $\delta_{ij}^{(v)}$ 9 in some applications.
Sparsification $i$ 0, usually zero.
Fusion weights $i$ 1, either uniform or reflecting per-view clustering quality.
Mixing parameters for one- or two-step walk ( $i$ 2, $i$ 3).
Extensive use of feature selection (e.g., differential expression by DESeq2) and log/variance-stabilizing transformations is empirically critical for optimal clustering accuracy, with up to $i$ 4– $i$ 5 improvement observed (Ma et al., 2017).

7. Software and Implementation

The ANF method is available as a Bioconductor package (https://bioconductor.org/packages/ANF), with code and detailed results at https://github.com/BeautyOfWeb/Clustering-TCGAFiveCancerTypes. The framework supports direct interfacing with omic data matrices, automated network fusion, spectral graph clustering, and integration into few-shot neural classification pipelines (Ma et al., 2018, Ma et al., 2017). ANF thus constitutes an efficient and interpretable paradigm for multi-view patient stratification, directly supporting advances in precision medicine and integrative genomics.

Markdown Report Issue Upgrade to Chat

References (2)

Affinity Network Fusion and Semi-supervised Learning for Cancer Patient Clustering (2018)

Integrate Multi-omic Data Using Affinity Network Fusion (ANF) for Cancer Patient Clustering (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Affinity Network Fusion (ANF).

Affinity Network Fusion (ANF)

1. Mathematical Formulation of ANF

2. Comparison to Similarity Network Fusion (SNF)

3. Algorithmic Workflow and Computational Considerations

4. Semi-supervised Extension: Few-shot Learning via ANF

5. Experimental Validation in Cancer Patient Clustering

6. Parameter Selection and Feature Engineering

7. Software and Implementation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Affinity Network Fusion (ANF)

1. Mathematical Formulation of ANF

2. Comparison to Similarity Network Fusion (SNF)

3. Algorithmic Workflow and Computational Considerations

4. Semi-supervised Extension: Few-shot Learning via ANF

5. Experimental Validation in Cancer Patient Clustering

6. Parameter Selection and Feature Engineering

7. Software and Implementation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research