Hierarchical VQ-GAE: Discrete Graph Autoencoding
- The paper presents a hierarchical vector quantized graph autoencoder that uses a two-layer codebook and annealing-based selection to robustly encode both node features and graph topology.
- The methodology combines a graph neural encoder with a hierarchical VQ module and dual decoders to simultaneously reconstruct node attributes and predict link probabilities.
- Empirical results demonstrate superior performance in link prediction and node classification across benchmark datasets, outperforming 16 state-of-the-art self-supervised models.
Hierarchical Vector Quantized Graph Autoencoder (HQA-GAE) is a neural framework that integrates hierarchical vector quantization and annealed code selection into graph autoencoders to address critical limitations in prior self-supervised graph representation learning. It combines a graph neural encoder, hierarchical two-layer vector quantization, and a dual-decoder structure, yielding discrete latent codes that robustly capture both node features and graph topology. HQA-GAE specifically resolves challenges in codebook underutilization and codebook space sparsity, outperforming a broad array of state-of-the-art baselines in link prediction and node classification on benchmark graph datasets (Zeng et al., 17 Apr 2025).
1. Architectural Overview
HQA-GAE extends standard graph autoencoders by embedding a vector quantization module between the encoder and decoder, structured as a two-layer hierarchical codebook, and uses a temperature-annealed stochastic code selection strategy. The encoder uses any graph neural network (GNN)—such as GCN, GAT, or GraphSAGE—to map each node’s input feature to a continuous latent vector . The vector quantization module contains:
- First-layer codebook:
- Second-layer codebook: ,
For each node, is assigned to its nearest first-layer code ; then, is mapped to its nearest center in the second-layer codebook . The node-feature decoder (a shallow GAT) reconstructs node features from , while the edge decoder predicts link probabilities.
2. Vector Quantization Formalism
Letting , the two-level vector quantization is defined as:
- First-layer codebook:
- Second-layer codebook:
Quantization uses squared Euclidean distance:
The node reconstruction is . This quantization enforces discrete partitioning and compression of node embeddings, encouraging structured and interpretable latent clustering.
3. Annealing-Based Code-Selection Mechanism
Standard VQ assignment can incur "winner-take-all" pathologies where only a subset of codes are utilized, impairing codebook diversity. HQA-GAE addresses this with a temperature-controlled softmax selection over , defined by
with temperature , and , where is a decay factor and prevents premature sharpening.
Initially, is large and code assignment is nearly uniform, promoting codebook exploration. As , assignment sharpens to the nearest code (argmin), focusing capacity onto the most salient codes. This adaptive annealing, implemented without extra loss functions, encourages broad codebook utilization in early epochs and specialization later.
4. Hierarchical Two-Layer Codebook
To alleviate codebook sparsity and encourage structured latent space organization, HQA-GAE introduces a two-layer codebook: the first contains a large set of codes, and the second clusters these codes into centers. For each second-layer center , a subset of the first-layer codes is assigned: This enforces that similar first-layer codes share a second-layer ancestor, implicitly via a second-level VQ loss. The approach sharpens the clustering of latent embeddings and encourages structural regularities, so that nodes with shared attributes or topology yield proximate discrete representations.
5. Joint Loss Function and Optimization
The total loss combines reconstruction, edge prediction, and vector-quantization penalties:
where
- Node-feature loss (): scaled cosine error, penalizing angle deviations in features.
- Edge loss (): negative sampling with MLP link predictor.
- VQ losses (, ): enforce commitment of encoder outputs to selected codes and update the codebook, with the stop-gradient operator ensuring valid optimization despite discrete lookup.
All parameters, including encoder, decoders, and codebooks, are learned via gradient descent, employing straight-through gradients for non-differentiable assignments.
6. Experimental Framework
Evaluation considers eight standard undirected, unweighted graphs from citation, co-purchase, co-author, and the OGB benchmark. Tasks include:
- Link prediction: measured by AUC and AP on held-out edges, with dot-product probes.
- Node classification: using a linear SVM classifier on the learned node embeddings, validated with 5-fold cross-validation.
A total of 16 self-supervised baselines are compared, including contrastive methods (DGI, GIC, GRACE, etc.) and autoencoding/masked models (GAE, VGAE, ARGA, Bandana, etc.). Experiments are conducted using PyTorch Geometric on NVIDIA A800 hardware with CUDA 12.1.
| Dataset Type | Example Datasets | Task(s) |
|---|---|---|
| Citation | Cora, CiteSeer, PubMed | Link prediction, Node classification |
| Co-purchase | Photo, Computers | Link prediction, Node classification |
| Co-author | CS, Physics | Link prediction, Node classification |
| OGB | ogbn-arxiv | Link prediction, Node classification |
7. Performance and Empirical Insights
HQA-GAE demonstrates leading performance across all major datasets and metrics. In link prediction (AUC ± SD on Cora), HQA-GAE achieves with average rank 1.00, surpassing Bandana and MaskGAE. On the Photo and Computers graphs, the model exceeds the next best AP by approximately 20 percentage points. In node classification, HQA-GAE ranks best on six of eight datasets (average rank 1.25), with, for example, 88.78 on Cora and 88.49 on PubMed, compared to Bandana’s 88.59 and 88.16.
The architecture’s empirical strengths can be ascribed to:
- Discrete compression via VQ: Facilitates encoding of salient structural graph signals rather than noise.
- Annealing-based assignment: Mitigates codebook collapse, ensuring broader representation and improved generalization.
- Hierarchical codebook clustering: Reduces code sparsity and creates more coherent, clusterable representations.
- Dual reconstruction targets: By reconstructing both node features and graph links, the model jointly leverages topological and attribute information, in contrast to methods relying solely on perturbation-based contrastive objectives.
These structural elements yield robust, well-clustered embeddings and consistent improvements on self-supervised graph learning tasks (Zeng et al., 17 Apr 2025).