TaxoBell: Gaussian Box Embedding
- TaxoBell is a self-supervised framework that embeds taxonomies using Gaussian boxes, combining geometric box properties with probabilistic semantics to capture asymmetric 'is-a' relations.
- It employs energy-based losses using Bhattacharyya and KL divergences to ensure smooth gradients and to model uncertainty and polysemy within structured knowledge systems.
- Empirical evaluations reveal substantial gains in metrics like MRR, Recall@k, and Mean Rank across single- and multi-parent taxonomy benchmarks compared to state-of-the-art methods.
TaxoBell is a Gaussian box embedding framework developed for self-supervised taxonomy expansion in structured knowledge systems. Its core innovation lies in fusing the geometric inductive bias of axis-aligned boxes with the probabilistic semantics of multivariate Gaussian distributions. By translating between box geometries and Gaussian densities, TaxoBell enables stable, interpretable modeling of asymmetric "is-a" relations, uncertainty, and polysemy in taxonomies. Empirical evaluations demonstrate substantial improvements over state-of-the-art baselines in taxonomy expansion tasks across several benchmarks (Mishra et al., 14 Jan 2026).
1. Motivation and Conceptual Foundations
Taxonomies require modeling of asymmetric relations (“is-a”): for example, “Dog is-a Mammal” holds, but “Mammal is-a Dog” does not. Conventional point-based embeddings—common in prior automated taxonomy expansion approaches—encode only symmetric similarity and thus cannot capture such directionality. Axis-aligned boxes (hyperrectangles in ) address this by enabling geometric containment () and disjointness ( implies no relation). However, box-based losses based on intersection volume are piecewise and suffer from vanishing or unstable gradients at box boundaries. Furthermore, these “hard” boxes lack a mechanism to express uncertainty: all points inside a box are equally plausible.
TaxoBell resolves these limitations by defining a “Gaussian box”—an axis-aligned box associated with a multivariate diagonal Gaussian. The box’s center and half-widths become the Gaussian’s mean and standard deviations, respectively. This allows encoding of probabilistic containment (graded “is-a” via density mass), smooth overlap and containment energies with stable gradients (Bhattacharyya coefficient, Kullback–Leibler divergence), and explicit quantification of semantic uncertainty and polysemy (via covariance structure).
2. Mathematical Model
2.1 Box and Gaussian Parameterization
An axis-aligned box in is parameterized by center and offsets , specifying lower and upper bounds , for each dimension . The box is defined as with volume 0.
TaxoBell projects each box 1 to a diagonal Gaussian 2 with 3 and 4. This links box half-widths with Gaussian standard deviations, so equal-density contours of the Gaussian are axis-aligned ellipsoids corresponding to box geometry.
2.2 Energy-based Loss Construction
For each (child, parent) positive pair and 5 sampled hard negatives, TaxoBell constructs energy-based losses:
- Symmetric Overlap (Bhattacharyya coefficient):
6
with
7
for 8. The loss for each triple is 9.
- Asymmetric Containment (KL divergence):
0
The alignment loss, enforcing lower KL for positives by a margin 1, is
2
To avoid vanishing variances, a reverse-KL (coverage) constraint is imposed:
3
with
4
The asymmetric loss combines both: 5.
- Regularization:
Minimizing and clipping variance terms ensure that offsets 6 do not collapse or explode:
7
- Total Loss: All terms are combined:
8
Because BC and KL are closed-form and smooth in both mean and covariance, gradients remain stable and non-vanishing, even for disjoint boxes, unlike volume-based losses.
3. Modeling of Uncertainty and Polysemy
The Gaussian covariance structure enables explicit modeling of semantic uncertainty: larger offsets 9 correspond to greater spread along dimension 0, so more general concepts (e.g., "Entity") have large variances while specific concepts yield small variances. Polysemy and ambiguity are represented naturally. Overlapping Gaussian ellipsoids along distinct axes can encode multiple senses of a term, and the symmetric overlap loss (Bhattacharyya) encourages shared density only for true hypernym pairs. Partial overlap with multiple parents in “multi-parent” taxonomies reflects real-world polysemy and is handled gracefully. Probabilistic containment is graded by KL divergence: low 1 signals most of the child’s mass lies within its parent’s ellipsoid; disjointness increases KL.
4. Training Algorithm
TaxoBell employs a self-supervised paradigm constructed from a seed taxonomy 2: every taxonomy edge 3 forms a positive training pair. For each child, 4 hard negatives are sampled from local graph neighborhoods (siblings, cousins, grandparents). Input concepts are encoded by a BERT encoder, then projected to box parameters 5 via MLP heads, and further mapped to Gaussian parameters 6. Per-triple losses (7) are computed, summed, and the parameters are optimized by AdamW.
The procedurally relevant pseudocode, as provided, is:
2
5. Hierarchical Inference and Reasoning
At inference, a new query 8 is encoded and projected to a Gaussian 9. For each candidate anchor 0, TaxoBell computes:
- TaxoBell_BC: 1
- TaxoBell_KL: 2
3 selection identifies the parent(s) for 4. The Gaussian–box connection guarantees transitivity of containment: if 5 contains 6 (in KL sense) and 7 contains 8, then 9 contains 0. Once 1 is attached to parent 2, all ancestors of 3 inherit containment relationships.
6. Empirical Evaluation and Results
TaxoBell is evaluated on five benchmark datasets, spanning both single- and multi-parent taxonomies:
| Dataset | 4 | Max Depth | Parentality |
|---|---|---|---|
| SemEval-Environment (ENV) | 261 | 6 | Single |
| SemEval-Science (SCI) | 429 | 8 | Single |
| WordNet sub-taxonomies | ≈21 | 3 | Single |
| SemEval-Food | 1,486 | 8 | Multi |
| MeSH | 9,710 | 12 | Multi |
Metrics include Hit@k, Recall@k, Mean Rank (MR), Mean Reciprocal Rank (MRR), and Wu–Palmer similarity. TaxoBell is benchmarked against eight strong baselines: BERT+MLP, TaxoExpan, Arborist, TMN, STEAM, BoxTaxo, TaxoEnrich, and others.
On all datasets, both TaxoBell_BC and TaxoBell_KL surpass the best baseline by approximately 19% in MRR and 25% in Recall@k (absolute), with MR reductions of up to 43%. The largest gains are noted for multi-parent datasets (Food, MeSH). Among variants, TaxoBell_KL produces higher Recall@1, while TaxoBell_BC shows slightly superior overall MRR. All improvements are statistically significant (Fisher’s combined 5).
Ablation studies validate the necessity of individual model components: removal of 6 destroys semantic cohesion (MR %%%%57058%%%%200, MRR 9), skipping 0 impairs directionality (R@1 drops 120%), omitting regularizers or reverse-KL constraints causes degenerate solutions and notable performance degradation, and direct Gaussian projection (mean+variance from encoder) underperforms box-to-Gaussian mapping by 100–200% relative.
Case studies reveal effective handling of overlapping and ambiguous relations (e.g., "Copper" in MeSH attaches to both “Transition elements” and “Heavy metals”; “Frozen orange juice” in Food is ranked under both “Fruit juice” and “Concentrate”).
7. Significance and Implications
TaxoBell demonstrates that integrating axis-aligned geometric structure with probabilistic uncertainty, via a box-to-Gaussian mapping and smooth energy-based optimization, yields a robust framework for scalable, self-supervised taxonomy expansion. The approach supports modeling of asymmetric relations, inherent uncertainty, and multi-parent, ambiguous concepts, with empirically validated advantages over existing alternatives. Empirical and ablation studies highlight the synergy between geometric and probabilistic modeling, and suggest a principled path for future research in logic-informed and uncertainty-aware structured representation learning (Mishra et al., 14 Jan 2026).