TaxoBell: Gaussian Box Embedding

Updated 21 January 2026

TaxoBell is a self-supervised framework that embeds taxonomies using Gaussian boxes, combining geometric box properties with probabilistic semantics to capture asymmetric 'is-a' relations.
It employs energy-based losses using Bhattacharyya and KL divergences to ensure smooth gradients and to model uncertainty and polysemy within structured knowledge systems.
Empirical evaluations reveal substantial gains in metrics like MRR, Recall@k, and Mean Rank across single- and multi-parent taxonomy benchmarks compared to state-of-the-art methods.

TaxoBell is a Gaussian box embedding framework developed for self-supervised taxonomy expansion in structured knowledge systems. Its core innovation lies in fusing the geometric inductive bias of axis-aligned boxes with the probabilistic semantics of multivariate Gaussian distributions. By translating between box geometries and Gaussian densities, TaxoBell enables stable, interpretable modeling of asymmetric "is-a" relations, uncertainty, and polysemy in taxonomies. Empirical evaluations demonstrate substantial improvements over state-of-the-art baselines in taxonomy expansion tasks across several benchmarks (Mishra et al., 14 Jan 2026).

1. Motivation and Conceptual Foundations

Taxonomies require modeling of asymmetric relations (“is-a”): for example, “Dog is-a Mammal” holds, but “Mammal is-a Dog” does not. Conventional point-based embeddings—common in prior automated taxonomy expansion approaches—encode only symmetric similarity and thus cannot capture such directionality. Axis-aligned boxes (hyperrectangles in $\mathbb{R}^d$ ) address this by enabling geometric containment ( $A \subseteq B \implies A~\text{is-a}~B$ ) and disjointness ( $A \cap B = \emptyset$ implies no relation). However, box-based losses based on intersection volume are piecewise and suffer from vanishing or unstable gradients at box boundaries. Furthermore, these “hard” boxes lack a mechanism to express uncertainty: all points inside a box are equally plausible.

TaxoBell resolves these limitations by defining a “Gaussian box”—an axis-aligned box associated with a multivariate diagonal Gaussian. The box’s center and half-widths become the Gaussian’s mean and standard deviations, respectively. This allows encoding of probabilistic containment (graded “is-a” via density mass), smooth overlap and containment energies with stable gradients (Bhattacharyya coefficient, Kullback–Leibler divergence), and explicit quantification of semantic uncertainty and polysemy (via covariance structure).

2. Mathematical Model

2.1 Box and Gaussian Parameterization

An axis-aligned box in $\mathbb{R}^d$ is parameterized by center $c \in \mathbb{R}^d$ and offsets $o \in \mathbb{R}^d_{>0}$ , specifying lower and upper bounds $l_i = c_i - o_i$ , $r_i = c_i + o_i$ for each dimension $i$ . The box is defined as $\operatorname{Box}(b) = \prod_{i=1}^d [c_i - o_i, c_i + o_i]$ with volume $A \subseteq B \implies A~\text{is-a}~B$ 0.

TaxoBell projects each box $A \subseteq B \implies A~\text{is-a}~B$ 1 to a diagonal Gaussian $A \subseteq B \implies A~\text{is-a}~B$ 2 with $A \subseteq B \implies A~\text{is-a}~B$ 3 and $A \subseteq B \implies A~\text{is-a}~B$ 4. This links box half-widths with Gaussian standard deviations, so equal-density contours of the Gaussian are axis-aligned ellipsoids corresponding to box geometry.

2.2 Energy-based Loss Construction

For each (child, parent) positive pair and $A \subseteq B \implies A~\text{is-a}~B$ 5 sampled hard negatives, TaxoBell constructs energy-based losses:

Symmetric Overlap (Bhattacharyya coefficient):

$A \subseteq B \implies A~\text{is-a}~B$ 6

with

$A \subseteq B \implies A~\text{is-a}~B$ 7

for $A \subseteq B \implies A~\text{is-a}~B$ 8. The loss for each triple is $A \subseteq B \implies A~\text{is-a}~B$ 9.

Asymmetric Containment (KL divergence):

$A \cap B = \emptyset$ 0

The alignment loss, enforcing lower KL for positives by a margin $A \cap B = \emptyset$ 1, is

$A \cap B = \emptyset$ 2

To avoid vanishing variances, a reverse-KL (coverage) constraint is imposed:

$A \cap B = \emptyset$ 3

with

$A \cap B = \emptyset$ 4

The asymmetric loss combines both: $A \cap B = \emptyset$ 5.

Regularization:

Minimizing and clipping variance terms ensure that offsets $A \cap B = \emptyset$ 6 do not collapse or explode:

$A \cap B = \emptyset$ 7

Total Loss: All terms are combined:

$A \cap B = \emptyset$ 8

Because BC and KL are closed-form and smooth in both mean and covariance, gradients remain stable and non-vanishing, even for disjoint boxes, unlike volume-based losses.

3. Modeling of Uncertainty and Polysemy

The Gaussian covariance structure enables explicit modeling of semantic uncertainty: larger offsets $A \cap B = \emptyset$ 9 correspond to greater spread along dimension $\mathbb{R}^d$ 0, so more general concepts (e.g., "Entity") have large variances while specific concepts yield small variances. Polysemy and ambiguity are represented naturally. Overlapping Gaussian ellipsoids along distinct axes can encode multiple senses of a term, and the symmetric overlap loss (Bhattacharyya) encourages shared density only for true hypernym pairs. Partial overlap with multiple parents in “multi-parent” taxonomies reflects real-world polysemy and is handled gracefully. Probabilistic containment is graded by KL divergence: low $\mathbb{R}^d$ 1 signals most of the child’s mass lies within its parent’s ellipsoid; disjointness increases KL.

4. Training Algorithm

TaxoBell employs a self-supervised paradigm constructed from a seed taxonomy $\mathbb{R}^d$ 2: every taxonomy edge $\mathbb{R}^d$ 3 forms a positive training pair. For each child, $\mathbb{R}^d$ 4 hard negatives are sampled from local graph neighborhoods (siblings, cousins, grandparents). Input concepts are encoded by a BERT encoder, then projected to box parameters $\mathbb{R}^d$ 5 via MLP heads, and further mapped to Gaussian parameters $\mathbb{R}^d$ 6. Per-triple losses ( $\mathbb{R}^d$ 7) are computed, summed, and the parameters are optimized by AdamW.

The procedurally relevant pseudocode, as provided, is:

$l_i = c_i - o_i$ 2

5. Hierarchical Inference and Reasoning

At inference, a new query $\mathbb{R}^d$ 8 is encoded and projected to a Gaussian $\mathbb{R}^d$ 9. For each candidate anchor $c \in \mathbb{R}^d$ 0, TaxoBell computes:

TaxoBell_BC: $c \in \mathbb{R}^d$ 1
TaxoBell_KL: $c \in \mathbb{R}^d$ 2

$c \in \mathbb{R}^d$ 3 selection identifies the parent(s) for $c \in \mathbb{R}^d$ 4. The Gaussian–box connection guarantees transitivity of containment: if $c \in \mathbb{R}^d$ 5 contains $c \in \mathbb{R}^d$ 6 (in KL sense) and $c \in \mathbb{R}^d$ 7 contains $c \in \mathbb{R}^d$ 8, then $c \in \mathbb{R}^d$ 9 contains $o \in \mathbb{R}^d_{>0}$ 0. Once $o \in \mathbb{R}^d_{>0}$ 1 is attached to parent $o \in \mathbb{R}^d_{>0}$ 2, all ancestors of $o \in \mathbb{R}^d_{>0}$ 3 inherit containment relationships.

6. Empirical Evaluation and Results

TaxoBell is evaluated on five benchmark datasets, spanning both single- and multi-parent taxonomies:

Dataset	$o \in \mathbb{R}^d_{>0}$ 4	Max Depth	Parentality
SemEval-Environment (ENV)	261	6	Single
SemEval-Science (SCI)	429	8	Single
WordNet sub-taxonomies	≈21	3	Single
SemEval-Food	1,486	8	Multi
MeSH	9,710	12	Multi

Metrics include Hit@k, Recall@k, Mean Rank (MR), Mean Reciprocal Rank (MRR), and Wu–Palmer similarity. TaxoBell is benchmarked against eight strong baselines: BERT+MLP, TaxoExpan, Arborist, TMN, STEAM, BoxTaxo, TaxoEnrich, and others.

On all datasets, both TaxoBell_BC and TaxoBell_KL surpass the best baseline by approximately 19% in MRR and 25% in Recall@k (absolute), with MR reductions of up to 43%. The largest gains are noted for multi-parent datasets (Food, MeSH). Among variants, TaxoBell_KL produces higher Recall@1, while TaxoBell_BC shows slightly superior overall MRR. All improvements are statistically significant (Fisher’s combined $o \in \mathbb{R}^d_{>0}$ 5).

Ablation studies validate the necessity of individual model components: removal of $o \in \mathbb{R}^d_{>0}$ 6 destroys semantic cohesion (MR %%%%57 $o \in \mathbb{R}^d_{>0}$ 058%%%%200, MRR $o \in \mathbb{R}^d_{>0}$ 9), skipping $l_i = c_i - o_i$ 0 impairs directionality (R@1 drops $l_i = c_i - o_i$ 120%), omitting regularizers or reverse-KL constraints causes degenerate solutions and notable performance degradation, and direct Gaussian projection (mean+variance from encoder) underperforms box-to-Gaussian mapping by 100–200% relative.

Case studies reveal effective handling of overlapping and ambiguous relations (e.g., "Copper" in MeSH attaches to both “Transition elements” and “Heavy metals”; “Frozen orange juice” in Food is ranked under both “Fruit juice” and “Concentrate”).

7. Significance and Implications

TaxoBell demonstrates that integrating axis-aligned geometric structure with probabilistic uncertainty, via a box-to-Gaussian mapping and smooth energy-based optimization, yields a robust framework for scalable, self-supervised taxonomy expansion. The approach supports modeling of asymmetric relations, inherent uncertainty, and multi-parent, ambiguous concepts, with empirically validated advantages over existing alternatives. Empirical and ablation studies highlight the synergy between geometric and probabilistic modeling, and suggest a principled path for future research in logic-informed and uncertainty-aware structured representation learning (Mishra et al., 14 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TaxoBell.

TaxoBell: Gaussian Box Embedding

1. Motivation and Conceptual Foundations

2. Mathematical Model

2.1 Box and Gaussian Parameterization

2.2 Energy-based Loss Construction

3. Modeling of Uncertainty and Polysemy

4. Training Algorithm

5. Hierarchical Inference and Reasoning

6. Empirical Evaluation and Results

7. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

TaxoBell: Gaussian Box Embedding

1. Motivation and Conceptual Foundations

2. Mathematical Model

2.1 Box and Gaussian Parameterization

2.2 Energy-based Loss Construction

3. Modeling of Uncertainty and Polysemy

4. Training Algorithm

5. Hierarchical Inference and Reasoning

6. Empirical Evaluation and Results

7. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research