Papers
Topics
Authors
Recent
Search
2000 character limit reached

HAT-GAE: Hierarchical Adaptive Masking & Corruption

Updated 21 April 2026
  • The paper introduces a novel self-supervised graph autoencoder that integrates hierarchical adaptive masking and trainable corruption to enhance feature reconstruction.
  • HAT-GAE employs an iterative masking mechanism based on node and feature importance, paired with learnable noise injection to create robust representations.
  • Extensive evaluations on transductive and inductive benchmarks demonstrate HAT-GAE’s superior performance over existing graph representation learning methods.

Hierarchical Adaptive Masking and Corruption (HAT-GAE) is a self-supervised generative graph auto-encoder architecture designed to enhance representation learning for graph-structured data. The model advances over prior self-supervised graph auto-encoders by incorporating a hierarchical adaptive masking mechanism, which incrementally increases the training difficulty, and a trainable corruption scheme, which enables the model to learn robust representations by undoing adaptively learned noise. HAT-GAE achieves leading performance across multiple transductive and inductive node classification benchmarks, demonstrating the effectiveness of its component innovations (Sun, 2023).

1. Architectural Overview

HAT-GAE consists of five principal modules: Adaptive and Hierarchical Masking, Trainable Corruption, a Graph Neural Network (GNN) Encoder, Masked Hidden Representation, and a Decoder with Feature Reconstruction. The model operates over the original graph G=(A,X1)\mathcal{G} = (A, X^1), where AA is the adjacency matrix and X1X^1 is the node feature matrix. The training pipeline for each epoch performs the following steps:

  1. Hierarchical adaptive masking is applied to Xn1X^{n-1} to produce XnX^n, iteratively increasing masking at scheduled intervals.
  2. Trainable corruption introduces learnable noise WnW_n to a subset of node features, selected by a Bernoulli mask MM, yielding corrupted features X~n\tilde{X}^n.
  3. The encoder Eθ\mathcal{E}_\theta (multi-head GAT) computes hidden states HnH^n for AA0.
  4. Hidden states of “noisy” nodes in AA1 are zeroed-out, forming AA2.
  5. The decoder AA3 (GAT-based) reconstructs node features AA4 from AA5.
  6. Only the features corresponding to corrupted nodes are reconstructed via a cosine-similarity-based loss.

This configuration allows the model to focus learning capacity on features and nodes most relevant for robust recovery of meaningful representations.

2. Hierarchical Adaptive Masking Mechanism

The hierarchical adaptive masking strategy aims to simulate progressive curriculum learning by dynamically increasing feature masking difficulty during training. Masking is performed along feature dimensions, guided by quantifiable importance scores.

2.1 Node and Dimension Importance

Node importance AA6 is computed, by default, as the in-degree:

AA7

Alternative importance metrics, such as eigenvector centrality or PageRank, are supported but not the default.

Feature dimension importance AA8 is then aggregated as:

AA9

These scores are sorted in descending order, with less informative dimensions masked earlier in training.

2.2 Adaptive and Hierarchical Scheduling

In each adaptive masking step, a fraction of the lowest-scored dimensions for every node is zeroed, with the masking rate X1X^10 controlling the masked proportion:

X1X^11

where X1X^12 masks the chosen dimensions (Eq. 4).

The masking schedule is governed by the number of rounds X1X^13 and total epochs X1X^14, with re-masking occurring every X1X^15 epochs. The dimensionality masked at each round X1X^16 is recursively decreased to avoid masking all features at once:

X1X^17

Through this procedure, the model incrementally increases task difficulty, first challenging the network with less critical features and progressing towards increasingly difficult signal recovery.

3. Trainable Corruption Scheme

Unlike models relying on fixed or random feature corruption, HAT-GAE introduces a corruption process with a learnable noise component.

3.1 Bernoulli Mask Sampling

For each node-feature pair X1X^18, a binary mask X1X^19 is sampled as:

Xn1X^{n-1}0

where Xn1X^{n-1}1 is the noisy node rate specifying the expected fraction of entries corrupted per epoch.

3.2 Learnable Noise Injection

A trainable noise parameter Xn1X^{n-1}2 is combined with the mask to produce corrupted features:

Xn1X^{n-1}3

Xn1X^{n-1}4 is optimized jointly with the encoder and decoder through the self-supervised reconstruction loss, with no additional regularization. This challenges the auto-encoder to become robust to adversarially-learned, rather than random, perturbations.

4. Self-Supervised Optimization Objective

After reconstruction, the model’s objective is to accurately recover only the corrupted node features. For nodes Xn1X^{n-1}5, the cosine similarity between true and reconstructed features is:

Xn1X^{n-1}6

The loss is:

Xn1X^{n-1}7

No auxiliary contrastive or adversarial losses are introduced. This singular focus facilitates computational efficiency while still providing strong training signal.

5. Implementation Specifications

HAT-GAE is implemented with a 2-layer Graph Attention Network for both encoder and decoder, using four attention heads per layer and a PReLU activation. Hidden dimension per node is set between 256 and 1024 depending on the dataset. The Adam optimizer initializes with a learning rate of 0.001, dataset-dependent weight decay, no warm-up, and a learning rate decay schedule. Hyperparameter choices include adaptive mask rates Xn1X^{n-1}8, noise rates Xn1X^{n-1}9 in XnX^n0, and number of hierarchical rounds XnX^n1 approximately XnX^n2–XnX^n3. Training durations range from 500 to 2000 epochs, implemented in PyTorch 1.9.1 with DGL 0.8.2, and trained on Tesla V100 GPUs.

6. Experimental Results and Analysis

HAT-GAE was evaluated using linear probing on ten standard benchmarks: eight transductive datasets (e.g., Cora, Citeseer, Pubmed, Amazon-Photo, Amazon-Computer, Coauthor-CS, Coauthor-Physics, OGBN-arXiv) and two inductive datasets (Reddit, PPI).

6.1 Transductive Node Classification

The model achieved the highest unsupervised accuracy on 7 of 8 datasets, outperforming both contrastive (DGI, GRACE, MVGRL, BGRL, InfoGCL, CCA-SSG) and generative (GAE, GPT-GNN, GATE, GraphMAE) baselines. Representative scores for selected benchmarks:

Dataset HAT-GAE Best Baseline Baseline Name
Cora 84.78 84.19 GraphMAE
Citeseer 74.28 73.41 GraphMAE
Pubmed 81.88 81.21 GraphMAE
Amazon-Photo 93.58 93.01 GraphMAE
Amazon-Computer 88.55 88.32 GraphMAE
Coauthor-CS 93.17 92.79 GraphMAE
Coauthor-Physics 95.57 95.30 GraphMAE
OGBN-arXiv 71.99 71.59 GraphMAE

6.2 Inductive Node Classification

On Reddit, HAT-GAE attained 96.06 micro-F1 (vs. 95.89 for GraphMAE). On PPI, HAT-GAE scored 74.72 (vs. 74.39 for GraphMAE).

6.3 Ablation and Sensitivity Studies

Ablation experiments demonstrate that each component—random masking, single adaptive mask, and omission of trainable corruption—results in notable declines of 0.7–2.7% absolute performance, validating the contributions of hierarchical masking and trainable corruption. Sensitivity to XnX^n4 and XnX^n5 is moderate for values below XnX^n6; higher rates cause sharp accuracy degradation due to over-masking/noising. Optimal performance is achieved with XnX^n7–XnX^n8 masking rounds on datasets such as Cora when training for XnX^n9 epochs.

7. Summary and Context

HAT-GAE introduces a curriculum-inspired masking approach that leverages quantifiable feature and node importance to tailor self-supervised graph representation learning, while trainable corruption provides adversarial challenge adapted to the learned data manifold. The resulting architecture is simple, requiring only a single cosine reconstruction loss, and empirically robust, matching or surpassing contemporary generative and contrastive graph neural network pretraining methods on diverse benchmarks. The method exemplifies the effectiveness of integrating adaptive, hierarchy-aware data corruption and progressive self-supervision for non-Euclidean domains (Sun, 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Adaptive Masking and Corruption (HAT-GAE).