SEAL Framework for Hierarchical Classification
- SEAL is a framework for data-driven hierarchical classification that jointly discovers latent structures and adapts to data distributions.
- It employs a metric-based approach using a 1-Wasserstein objective over a tree metric space to align observed labels with latent hierarchical groupings.
- The framework enhances classification performance and interpretability through latent label augmentation and principled semi-supervised learning.
Simultaneous Label Hierarchy Exploration and Learning (SEAL) is a framework for data-driven hierarchical classification that addresses the mismatch between predefined label taxonomies and real data distributions. Instead of relying solely on externally specified hierarchies, SEAL jointly discovers latent structure among class labels while optimizing supervised and semi-supervised objectives. The framework uses a metric-based approach to embed observed and latent labels in a tree metric space, and leverages a 1-Wasserstein objective over the implied hierarchical structure to guide both hierarchy exploration and learning. SEAL shows improved classification performance and extracts interpretable hierarchical relationships from annotated and partially annotated data (Tan et al., 2023).
1. Motivation: Hierarchy Learning vs. Fixed Taxonomies
Most hierarchical classification systems assume the existence of a well-defined label hierarchy. However, such taxonomies are often incomplete, inconsistent with the data, or unavailable for novel domains. Conventional hierarchical classifiers propagate predictions along a fixed tree, which can result in suboptimal performance because:
- The hierarchy may not capture actual data semantic relationships.
- Supervised and semi-supervised learning can be constrained by arbitrary taxonomic splits, missing latent structure. SEAL aims to jointly discover and exploit the optimal label hierarchy for a given dataset by augmenting observed labels with latent nodes that reflect a data-driven, hierarchical prior.
2. Framework Overview and Latent Label Augmentation
SEAL models label learning over a hierarchical tree, where each leaf corresponds to an observed class and internal nodes represent latent groupings. Given input examples , the framework constructs:
- An initial tree structure encoding prior knowledge or simple assumptions about label relations.
- Latent labels sampled or inferred to augment and fill out the hierarchical structure. The process treats these latent labels as variables to be optimized, subject to constraints that encourage them to align well with data clusters and support semi-supervised propagation.
3. Hierarchical Metric and 1-Wasserstein Objective
The core of SEAL is the use of the 1-Wasserstein metric over the tree metric space to quantify the distance between the empirical distribution of observed and latent labels. Let be the tree defining label relationships, and the tree-induced metric between nodes and . For each sample, SEAL defines distributions (observed labels) and (latent labels), and computes the metric
where denotes couplings over the tree. This term in the loss encourages the discovery of tree structures (including both observed and latent nodes) that minimize class confusion and respect underlying data topology.
4. Joint Hierarchy Exploration and Classification Learning
SEAL trains with a compound loss integrating supervised classification and the hierarchical Wasserstein term. The learning objective includes:
- Standard cross-entropy over labeled data.
- Hierarchical regularization that penalizes Wasserstein divergence between empirical label distributions and those implied by the evolving tree structure.
- Semi-supervised loss components leveraging unlabeled data: pseudo-labels can be attributed latent hierarchical nodes, propagating information throughout the hierarchy. The optimization alternates between updating the hierarchy (possibly with constraints on tree topology or depth), reassigning latent labels, and refining the classifier parameters.
5. Semi-Supervised Capability and Distributional Alignment
SEAL demonstrates particular strength in semi-supervised regimes, where only part of the data is labeled. By assigning unlabeled examples to latent nodes in the tree, the model can distribute pseudo-supervision in a principled manner, inferring hierarchical structure even from sparse annotations. This enables better utilization of all available data, improved distributional alignment between input clusters and hierarchy leaves, and more reliable pseudo-labeling for rare classes.
6. Empirical Results and Hierarchy Extraction
The method is evaluated on diverse datasets, covering both structured and unstructured meta-label settings. SEAL consistently achieves superior performance in both fully supervised and semi-supervised classification compared to baselines using fixed hierarchies or flat label spaces. It reveals nontrivial hierarchical structure among classes; for example, it recovers known relations (such as superclasses and clusters) and exposes new ones in domains lacking curated taxonomies. The extracted hierarchies are interpretable and can inform downstream tasks (e.g., structured output prediction or explainable AI).
7. Practical Implementation and Future Directions
The implementation is public and includes routines for initializing hierarchical priors, latent label inference, Wasserstein computation over trees, and integration with standard deep learning frameworks. SEAL’s approach is extensible to domains with rich but incomplete taxonomic information, and can be combined with neural architectures for end-to-end learning. Future work includes refining tree exploration algorithms, integrating user-provided weak supervision, scaling to large numbers of classes and unlabeled points, and adapting SEAL to multi-label and multi-hierarchy contexts.
SEAL offers a general, metric-based approach to hierarchical learning that automatically adapts to data distributions, leverages latent structure in semi-supervised settings, and directly optimizes for data-aligned class hierarchies. Its use of the 1-Wasserstein metric over tree spaces sets it apart from standard approaches and provides a principled foundation for structure discovery and robust classification (Tan et al., 2023).