Hierarchical VQ-GAE: Discrete Graph Autoencoding

Updated 17 December 2025

The paper presents a hierarchical vector quantized graph autoencoder that uses a two-layer codebook and annealing-based selection to robustly encode both node features and graph topology.
The methodology combines a graph neural encoder with a hierarchical VQ module and dual decoders to simultaneously reconstruct node attributes and predict link probabilities.
Empirical results demonstrate superior performance in link prediction and node classification across benchmark datasets, outperforming 16 state-of-the-art self-supervised models.

Hierarchical Vector Quantized Graph Autoencoder (HQA-GAE) is a neural framework that integrates hierarchical vector quantization and annealed code selection into graph autoencoders to address critical limitations in prior self-supervised graph representation learning. It combines a graph neural encoder, hierarchical two-layer vector quantization, and a dual-decoder structure, yielding discrete latent codes that robustly capture both node features and graph topology. HQA-GAE specifically resolves challenges in codebook underutilization and codebook space sparsity, outperforming a broad array of state-of-the-art baselines in link prediction and node classification on benchmark graph datasets (Zeng et al., 17 Apr 2025).

1. Architectural Overview

HQA-GAE extends standard graph autoencoders by embedding a vector quantization module between the encoder and decoder, structured as a two-layer hierarchical codebook, and uses a temperature-annealed stochastic code selection strategy. The encoder uses any graph neural network (GNN)—such as GCN, GAT, or GraphSAGE—to map each node’s input feature $\mathbf{x}_i\in\mathbb{R}^D$ to a continuous latent vector $\mathbf{h}_i=E(\mathbf{x}_i)\in\mathbb{R}^d$ . The vector quantization module contains:

First-layer codebook: $\{\mathbf{e}_{1,j}\}_{j=1}^M$
Second-layer codebook: $\{\mathbf{e}_{2,k}\}_{k=1}^C$ , $C<M$

For each node, $\mathbf{h}_i$ is assigned to its nearest first-layer code $\mathbf{e}_{1,i}$ ; then, $\mathbf{e}_{1,i}$ is mapped to its nearest center in the second-layer codebook $\mathbf{e}_{2,i}$ . The node-feature decoder $D_{\rm node}$ (a shallow GAT) reconstructs node features $\hat{\mathbf{x}}_i$ from $\mathbf{e}_{1,i}$ , while the edge decoder $D_{\rm edge}(h_i, h_j)=\sigma(\mathrm{MLP}(h_i\circ h_j))$ predicts link probabilities.

2. Vector Quantization Formalism

Letting $\mathbf{h}_i=E(\mathbf{x}_i)\in\mathbb{R}^d$ , the two-level vector quantization is defined as:

First-layer codebook: $\mathcal{C}_1=\{\mathbf{e}_{1,j}\in\mathbb{R}^d\}_{j=1}^M$
Second-layer codebook: $\mathcal{C}_2=\{\mathbf{e}_{2,k}\in\mathbb{R}^d\}_{k=1}^C$

Quantization uses squared Euclidean distance: $d(z,e)=\|z-e\|_2^2$

$\mathbf{e}_{1,i} =\mathop{\arg\min}_{\mathbf{e}\in\mathcal{C}_1} \|\mathbf{h}_i-\mathbf{e}\|_2^2, \qquad \mathbf{e}_{2,i} =\mathop{\arg\min}_{\mathbf{e}\in\mathcal{C}_2} \|\mathbf{e}_{1,i}-\mathbf{e}\|_2^2$

The node reconstruction is $\hat{\mathbf{x}}_i = D_{\rm node}(\mathbf{e}_{1,i})$ . This quantization enforces discrete partitioning and compression of node embeddings, encouraging structured and interpretable latent clustering.

3. Annealing-Based Code-Selection Mechanism

Standard VQ assignment can incur "winner-take-all" pathologies where only a subset of codes are utilized, impairing codebook diversity. HQA-GAE addresses this with a temperature-controlled softmax selection over $\mathcal{C}_1$ , defined by

$s_{i,j} = -\|\mathbf{h}_i - \mathbf{e}_{1,j}\|_2^2$

$p_{i,j}(t) = \frac{\exp(s_{i,j}/\tau(t))}{\sum_{k=1}^M \exp(s_{i,k}/\tau(t))}$

with temperature $\tau(0)=\tau_0$ , and $\tau(t)=\max(\gamma\,\tau(t-1),\epsilon)$ , where $\gamma$ is a decay factor and $\epsilon>0$ prevents premature sharpening.

Initially, $\tau$ is large and code assignment is nearly uniform, promoting codebook exploration. As $\tau\to 0$ , assignment sharpens to the nearest code (argmin), focusing capacity onto the most salient codes. This adaptive annealing, implemented without extra loss functions, encourages broad codebook utilization in early epochs and specialization later.

4. Hierarchical Two-Layer Codebook

To alleviate codebook sparsity and encourage structured latent space organization, HQA-GAE introduces a two-layer codebook: the first contains a large set of codes, and the second clusters these codes into $C$ centers. For each second-layer center $\mathbf{e}_{2,k}$ , a subset $S_k$ of the first-layer codes is assigned: $\max_{\mathcal{C}_2,S_1,\dots,S_C} \sum_{k=1}^C \sum_{j\in S_k} -\|\mathbf{e}_{1,j} - \mathbf{e}_{2,k}\|_2^2$ This enforces that similar first-layer codes share a second-layer ancestor, implicitly via a second-level VQ loss. The approach sharpens the clustering of latent embeddings and encourages structural regularities, so that nodes with shared attributes or topology yield proximate discrete representations.

5. Joint Loss Function and Optimization

The total loss combines reconstruction, edge prediction, and vector-quantization penalties:

$\mathcal{L} = \mathcal{L}_{\rm NodeRec} + \mathcal{L}_{\rm EdgeRec} + \alpha\mathcal{L}_{\rm vq1} + \beta\mathcal{L}_{\rm vq2}$

where

Node-feature loss ( $\mathcal{L}_{\rm NodeRec}$ ): scaled cosine error, penalizing angle deviations in features.
Edge loss ( $\mathcal{L}_{\rm EdgeRec}$ ): negative sampling with MLP link predictor.
VQ losses ( $\mathcal{L}_{\rm vq1}$ , $\mathcal{L}_{\rm vq2}$ ): enforce commitment of encoder outputs to selected codes and update the codebook, with the stop-gradient operator ensuring valid optimization despite discrete lookup.

All parameters, including encoder, decoders, and codebooks, are learned via gradient descent, employing straight-through gradients for non-differentiable assignments.

6. Experimental Framework

Evaluation considers eight standard undirected, unweighted graphs from citation, co-purchase, co-author, and the OGB benchmark. Tasks include:

Link prediction: measured by AUC and AP on held-out edges, with dot-product probes.
Node classification: using a linear SVM classifier on the learned node embeddings, validated with 5-fold cross-validation.

A total of 16 self-supervised baselines are compared, including contrastive methods (DGI, GIC, GRACE, etc.) and autoencoding/masked models (GAE, VGAE, ARGA, Bandana, etc.). Experiments are conducted using PyTorch Geometric on NVIDIA A800 hardware with CUDA 12.1.

Dataset Type	Example Datasets	Task(s)
Citation	Cora, CiteSeer, PubMed	Link prediction, Node classification
Co-purchase	Photo, Computers	Link prediction, Node classification
Co-author	CS, Physics	Link prediction, Node classification
OGB	ogbn-arxiv	Link prediction, Node classification

7. Performance and Empirical Insights

HQA-GAE demonstrates leading performance across all major datasets and metrics. In link prediction (AUC ± SD on Cora), HQA-GAE achieves $96.02\pm 0.11$ with average rank 1.00, surpassing Bandana and MaskGAE. On the Photo and Computers graphs, the model exceeds the next best AP by approximately 20 percentage points. In node classification, HQA-GAE ranks best on six of eight datasets (average rank 1.25), with, for example, 88.78 on Cora and 88.49 on PubMed, compared to Bandana’s 88.59 and 88.16.

The architecture’s empirical strengths can be ascribed to:

Discrete compression via VQ: Facilitates encoding of salient structural graph signals rather than noise.
Annealing-based assignment: Mitigates codebook collapse, ensuring broader representation and improved generalization.
Hierarchical codebook clustering: Reduces code sparsity and creates more coherent, clusterable representations.
Dual reconstruction targets: By reconstructing both node features and graph links, the model jointly leverages topological and attribute information, in contrast to methods relying solely on perturbation-based contrastive objectives.

These structural elements yield robust, well-clustered embeddings and consistent improvements on self-supervised graph learning tasks (Zeng et al., 17 Apr 2025).

PDF Markdown Chat (Pro)

References (1)

Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Hierarchical VQ-GAE.