KBGAN: Adversarial Sampling in KGE
- KBGAN is an adversarial training framework for knowledge graph embedding that uses a generator and a discriminator to create challenging negative samples.
- The framework employs a softmax-based generator and a margin-based discriminator to progressively refine negative examples and enhance relational learning.
- Empirical evaluations on benchmarks like FB15k-237 and WN18 demonstrate significant improvements in Mean Reciprocal Rank and Hits@10.
KBGAN is an adversarial training framework designed to enhance knowledge graph embedding (KGE) models by addressing the limitations of negative sampling in knowledge graph completion tasks. KBGAN pairs two KGE models—a generator and a discriminator—in an adversarial setup inspired by generative adversarial networks (GANs), where the generator proposes hard negative samples for the discriminator, resulting in improved embedding quality and link prediction performance across standard benchmarks (Cai et al., 2017).
1. Motivation and Limitations of Uniform Negative Sampling
Standard knowledge graphs store only positive triples representing observed facts. KGE methods require negative examples to learn the underlying structure; the conventional approach creates these by corrupting head or tail entities with randomly selected alternatives. These "uniform negatives" often result in trivially false triples, such as LocatedIn(NewOrleans, BarackObama), which predominantly violate basic type constraints. Training with such "type-mismatched" negatives encourages models to focus only on type consistency rather than more nuanced relational errors (e.g., confusing LocatedIn(NewOrleans, Florida) with a correct triple). As a result, margin-based embedding methods, including TransE and TransD, are limited in their capacity to capture rich relational patterns beyond entity type distinctions (Cai et al., 2017).
2. Adversarial Sampling Philosophy and Core Components
KBGAN adapts the adversarial philosophy of GANs to discrete structured data in KGE. The framework introduces a generator KGE model, selected for its probabilistic, softmax-based scoring, and a discriminator KGE model, chosen for its margin-based distance scoring.
- Generator (): Defines a softmax distribution over a candidate set of negatives for a given positive triple :
- Discriminator (): Margin-based model with score interpreted as a distance, seeking to minimize distance for positives and maximize it for adversarially-generated negatives.
The generator iteratively learns to propose negatives that the discriminator currently fails to distinguish from true triples, leading to a curriculum of increasingly hard negatives. This process sharpens the discriminator's embedding function, yielding improved performance and finer decision boundaries in the learned embedding space.
3. Objective Functions and Optimization
The learning process comprises two principal objectives:
- Discriminator Margin Loss:
where and is the margin.
- Generator Reward Objective:
Optimized by REINFORCE policy gradients:
0
A running average baseline is subtracted from rewards to reduce variance.
4. Training Procedure and Hyperparameterization
KBGAN training consists of (a) pre-training both generator and discriminator with standard negative sampling, and (b) joint adversarial updates using alternating gradient steps. The batch-level training pseudo-code is:
6
Key hyperparameters: batch size 1 (dataset-dependent), 2 negative candidates per positive, optimizer = Adam with 3, 4, 5, margin 6, embedding dimension 7, 8 distance for TransE/TransD, and L2-regularization for DistMult/ComplEx (Cai et al., 2017).
5. Model Variants and Instantiations
KBGAN is model-agnostic, compatible with mature KGE architectures that support the respective softmax and margin scoring functions. The original experiments instantiate the following:
- Discriminator (margin-based):
- TransE: 9, unit-norm constraints on embeddings.
- TransD: 0, unit-norm constraints.
- Generator (softmax-based):
- DistMult: 1 with L2-regularization.
- ComplEx: 2 in 3.
There is no modification to the core adversarial or sampling logic; only the functional form and constraints of 4 and 5 change across variants.
6. Empirical Evaluation and Findings
Evaluation is conducted on standard knowledge base completion benchmarks:
- FB15k-237: 237 relations, 14,541 entities, 272,115 train triples.
- WN18: 18 relations, 40,943 entities, 141,442 train triples.
- WN18RR: 11 relations, 40,943 entities, 86,835 train triples (removes inverse shortcuts).
Performance is measured by filtered Mean Reciprocal Rank (MRR) and Hits@10. The following table summarizes core results (filtered setting):
| Discriminator | Dataset | Pre-trained MRR / H@10 | KBGAN MRR / H@10 | ΔMRR |
|---|---|---|---|---|
| TransE + DistMult | FB15k-237 | 24.2 / 42.2 | 27.4 / 45.0 | +3.2 |
| TransE + ComplEx | FB15k-237 | 24.2 / 42.2 | 27.8 / 45.3 | +3.6 |
| TransD + DistMult | FB15k-237 | 24.5 / 42.7 | 27.8 / 45.8 | +3.3 |
| TransD + ComplEx | FB15k-237 | 24.5 / 42.7 | 27.7 / 45.8 | +3.2 |
| TransE + DistMult | WN18 | 43.3 / 91.5 | 71.0 / 94.9 | +27.7 |
| TransD + DistMult | WN18 | 49.4 / 92.8 | 77.2 / 94.8 | +27.8 |
| TransE + DistMult | WN18RR | 18.6 / 45.9 | 21.3 / 48.1 | +2.7 |
| TransD + ComplEx | WN18RR | 19.2 / 46.5 | 21.5 / 46.9 | +2.3 |
Across all (generator, discriminator) combinations and datasets, KBGAN provides MRR improvements of 2–28 points and 2–4 point gains in Hits@10. The most pronounced benefit occurs on WN18, where uniform negatives are particularly uninformative due to strong inverse relations; adversarial sampling compels finer discrimination (Cai et al., 2017).
Qualitative analysis indicates that after adversarial training, the generator produces "hard negatives"—semantically related yet incorrect triples (e.g., selecting “bond_NN_6” for “meeting, hypernym, social_gathering”), unlike the random, type-irrelevant negatives of conventional sampling.
7. Conclusion and Implications
KBGAN demonstrates that adversarial sampling can be effectively transposed to KGE, yielding a flexible and architecture-agnostic method for constructing high-quality negatives through a policy-gradient-trained generator. This results in consistent improvement over baseline KGE models for knowledge base completion. The ability to mix and match mature embedding models as generator and discriminator components facilitates broad applicability. Empirical gains, particularly on benchmarks where traditional negatives are uninformative, highlight the critical role of hard negative sampling in KGE algorithm design (Cai et al., 2017).