Papers
Topics
Authors
Recent
Search
2000 character limit reached

TaxoBell: Gaussian Box Embedding

Updated 21 January 2026
  • TaxoBell is a self-supervised framework that embeds taxonomies using Gaussian boxes, combining geometric box properties with probabilistic semantics to capture asymmetric 'is-a' relations.
  • It employs energy-based losses using Bhattacharyya and KL divergences to ensure smooth gradients and to model uncertainty and polysemy within structured knowledge systems.
  • Empirical evaluations reveal substantial gains in metrics like MRR, Recall@k, and Mean Rank across single- and multi-parent taxonomy benchmarks compared to state-of-the-art methods.

TaxoBell is a Gaussian box embedding framework developed for self-supervised taxonomy expansion in structured knowledge systems. Its core innovation lies in fusing the geometric inductive bias of axis-aligned boxes with the probabilistic semantics of multivariate Gaussian distributions. By translating between box geometries and Gaussian densities, TaxoBell enables stable, interpretable modeling of asymmetric "is-a" relations, uncertainty, and polysemy in taxonomies. Empirical evaluations demonstrate substantial improvements over state-of-the-art baselines in taxonomy expansion tasks across several benchmarks (Mishra et al., 14 Jan 2026).

1. Motivation and Conceptual Foundations

Taxonomies require modeling of asymmetric relations (“is-a”): for example, “Dog is-a Mammal” holds, but “Mammal is-a Dog” does not. Conventional point-based embeddings—common in prior automated taxonomy expansion approaches—encode only symmetric similarity and thus cannot capture such directionality. Axis-aligned boxes (hyperrectangles in Rd\mathbb{R}^d) address this by enabling geometric containment (AB    A is-a BA \subseteq B \implies A~\text{is-a}~B) and disjointness (AB=A \cap B = \emptyset implies no relation). However, box-based losses based on intersection volume are piecewise and suffer from vanishing or unstable gradients at box boundaries. Furthermore, these “hard” boxes lack a mechanism to express uncertainty: all points inside a box are equally plausible.

TaxoBell resolves these limitations by defining a “Gaussian box”—an axis-aligned box associated with a multivariate diagonal Gaussian. The box’s center and half-widths become the Gaussian’s mean and standard deviations, respectively. This allows encoding of probabilistic containment (graded “is-a” via density mass), smooth overlap and containment energies with stable gradients (Bhattacharyya coefficient, Kullback–Leibler divergence), and explicit quantification of semantic uncertainty and polysemy (via covariance structure).

2. Mathematical Model

2.1 Box and Gaussian Parameterization

An axis-aligned box in Rd\mathbb{R}^d is parameterized by center cRdc \in \mathbb{R}^d and offsets oR>0do \in \mathbb{R}^d_{>0}, specifying lower and upper bounds li=cioil_i = c_i - o_i, ri=ci+oir_i = c_i + o_i for each dimension ii. The box is defined as Box(b)=i=1d[cioi,ci+oi]\operatorname{Box}(b) = \prod_{i=1}^d [c_i - o_i, c_i + o_i] with volume AB    A is-a BA \subseteq B \implies A~\text{is-a}~B0.

TaxoBell projects each box AB    A is-a BA \subseteq B \implies A~\text{is-a}~B1 to a diagonal Gaussian AB    A is-a BA \subseteq B \implies A~\text{is-a}~B2 with AB    A is-a BA \subseteq B \implies A~\text{is-a}~B3 and AB    A is-a BA \subseteq B \implies A~\text{is-a}~B4. This links box half-widths with Gaussian standard deviations, so equal-density contours of the Gaussian are axis-aligned ellipsoids corresponding to box geometry.

2.2 Energy-based Loss Construction

For each (child, parent) positive pair and AB    A is-a BA \subseteq B \implies A~\text{is-a}~B5 sampled hard negatives, TaxoBell constructs energy-based losses:

  • Symmetric Overlap (Bhattacharyya coefficient):

AB    A is-a BA \subseteq B \implies A~\text{is-a}~B6

with

AB    A is-a BA \subseteq B \implies A~\text{is-a}~B7

for AB    A is-a BA \subseteq B \implies A~\text{is-a}~B8. The loss for each triple is AB    A is-a BA \subseteq B \implies A~\text{is-a}~B9.

  • Asymmetric Containment (KL divergence):

AB=A \cap B = \emptyset0

The alignment loss, enforcing lower KL for positives by a margin AB=A \cap B = \emptyset1, is

AB=A \cap B = \emptyset2

To avoid vanishing variances, a reverse-KL (coverage) constraint is imposed:

AB=A \cap B = \emptyset3

with

AB=A \cap B = \emptyset4

The asymmetric loss combines both: AB=A \cap B = \emptyset5.

  • Regularization:

Minimizing and clipping variance terms ensure that offsets AB=A \cap B = \emptyset6 do not collapse or explode:

AB=A \cap B = \emptyset7

  • Total Loss: All terms are combined:

AB=A \cap B = \emptyset8

Because BC and KL are closed-form and smooth in both mean and covariance, gradients remain stable and non-vanishing, even for disjoint boxes, unlike volume-based losses.

3. Modeling of Uncertainty and Polysemy

The Gaussian covariance structure enables explicit modeling of semantic uncertainty: larger offsets AB=A \cap B = \emptyset9 correspond to greater spread along dimension Rd\mathbb{R}^d0, so more general concepts (e.g., "Entity") have large variances while specific concepts yield small variances. Polysemy and ambiguity are represented naturally. Overlapping Gaussian ellipsoids along distinct axes can encode multiple senses of a term, and the symmetric overlap loss (Bhattacharyya) encourages shared density only for true hypernym pairs. Partial overlap with multiple parents in “multi-parent” taxonomies reflects real-world polysemy and is handled gracefully. Probabilistic containment is graded by KL divergence: low Rd\mathbb{R}^d1 signals most of the child’s mass lies within its parent’s ellipsoid; disjointness increases KL.

4. Training Algorithm

TaxoBell employs a self-supervised paradigm constructed from a seed taxonomy Rd\mathbb{R}^d2: every taxonomy edge Rd\mathbb{R}^d3 forms a positive training pair. For each child, Rd\mathbb{R}^d4 hard negatives are sampled from local graph neighborhoods (siblings, cousins, grandparents). Input concepts are encoded by a BERT encoder, then projected to box parameters Rd\mathbb{R}^d5 via MLP heads, and further mapped to Gaussian parameters Rd\mathbb{R}^d6. Per-triple losses (Rd\mathbb{R}^d7) are computed, summed, and the parameters are optimized by AdamW.

The procedurally relevant pseudocode, as provided, is:

li=cioil_i = c_i - o_i2

5. Hierarchical Inference and Reasoning

At inference, a new query Rd\mathbb{R}^d8 is encoded and projected to a Gaussian Rd\mathbb{R}^d9. For each candidate anchor cRdc \in \mathbb{R}^d0, TaxoBell computes:

  • TaxoBell_BC: cRdc \in \mathbb{R}^d1
  • TaxoBell_KL: cRdc \in \mathbb{R}^d2

cRdc \in \mathbb{R}^d3 selection identifies the parent(s) for cRdc \in \mathbb{R}^d4. The Gaussian–box connection guarantees transitivity of containment: if cRdc \in \mathbb{R}^d5 contains cRdc \in \mathbb{R}^d6 (in KL sense) and cRdc \in \mathbb{R}^d7 contains cRdc \in \mathbb{R}^d8, then cRdc \in \mathbb{R}^d9 contains oR>0do \in \mathbb{R}^d_{>0}0. Once oR>0do \in \mathbb{R}^d_{>0}1 is attached to parent oR>0do \in \mathbb{R}^d_{>0}2, all ancestors of oR>0do \in \mathbb{R}^d_{>0}3 inherit containment relationships.

6. Empirical Evaluation and Results

TaxoBell is evaluated on five benchmark datasets, spanning both single- and multi-parent taxonomies:

Dataset oR>0do \in \mathbb{R}^d_{>0}4 Max Depth Parentality
SemEval-Environment (ENV) 261 6 Single
SemEval-Science (SCI) 429 8 Single
WordNet sub-taxonomies ≈21 3 Single
SemEval-Food 1,486 8 Multi
MeSH 9,710 12 Multi

Metrics include Hit@k, Recall@k, Mean Rank (MR), Mean Reciprocal Rank (MRR), and Wu–Palmer similarity. TaxoBell is benchmarked against eight strong baselines: BERT+MLP, TaxoExpan, Arborist, TMN, STEAM, BoxTaxo, TaxoEnrich, and others.

On all datasets, both TaxoBell_BC and TaxoBell_KL surpass the best baseline by approximately 19% in MRR and 25% in Recall@k (absolute), with MR reductions of up to 43%. The largest gains are noted for multi-parent datasets (Food, MeSH). Among variants, TaxoBell_KL produces higher Recall@1, while TaxoBell_BC shows slightly superior overall MRR. All improvements are statistically significant (Fisher’s combined oR>0do \in \mathbb{R}^d_{>0}5).

Ablation studies validate the necessity of individual model components: removal of oR>0do \in \mathbb{R}^d_{>0}6 destroys semantic cohesion (MR %%%%57oR>0do \in \mathbb{R}^d_{>0}058%%%%200, MRR oR>0do \in \mathbb{R}^d_{>0}9), skipping li=cioil_i = c_i - o_i0 impairs directionality (R@1 drops li=cioil_i = c_i - o_i120%), omitting regularizers or reverse-KL constraints causes degenerate solutions and notable performance degradation, and direct Gaussian projection (mean+variance from encoder) underperforms box-to-Gaussian mapping by 100–200% relative.

Case studies reveal effective handling of overlapping and ambiguous relations (e.g., "Copper" in MeSH attaches to both “Transition elements” and “Heavy metals”; “Frozen orange juice” in Food is ranked under both “Fruit juice” and “Concentrate”).

7. Significance and Implications

TaxoBell demonstrates that integrating axis-aligned geometric structure with probabilistic uncertainty, via a box-to-Gaussian mapping and smooth energy-based optimization, yields a robust framework for scalable, self-supervised taxonomy expansion. The approach supports modeling of asymmetric relations, inherent uncertainty, and multi-parent, ambiguous concepts, with empirically validated advantages over existing alternatives. Empirical and ablation studies highlight the synergy between geometric and probabilistic modeling, and suggest a principled path for future research in logic-informed and uncertainty-aware structured representation learning (Mishra et al., 14 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TaxoBell.