Papers
Topics
Authors
Recent
Search
2000 character limit reached

FT+KB Classifier Overview

Updated 5 January 2026
  • FT+KB classifier is a hybrid technique combining fine-tuned LLM embeddings with knowledge base entity representations to boost classification accuracy.
  • It uses truncated SVD for dimensionality reduction and an AutoML pipeline to optimize feature extraction and model selection.
  • Applied to both document and functional data, it demonstrates robust performance improvements even in few-shot and noisy settings.

The FT+KB classifier refers to a class of machine learning techniques integrating fine-tuned representations from large neural models (FT: Fine-Tuned, typically LLMs) with explicit knowledge-bases (KB: Knowledge Base), or, alternatively, functional depth-based approaches leveraging both functional and kernel-based information. These classifiers combine heterogeneous feature sources to enhance classification efficacy under heterogeneous data regimes, including document, tabular, and functional modalities.

1. Architectures and Formalization

Two distinct but rigorously grounded FT+KB classifier families have been advanced.

Document Classification (LLM+KB Fusion):

This architecture leverages both fine-tuned LLM embeddings and knowledge-base-derived entity embeddings for document representation (Koloski et al., 2024). Each input document dd is simultaneously processed:

  • LLM Encoder: %%%%1%%%%; e.g., DLLM=1024D_\mathrm{LLM}=1024 (Angle), $4096$ (LLM2Vec-LLaMa3), $1536/3072$ (OpenAI).
  • Entity Linking: An external entity linker (Babelfy) extracts entities {ei}\{e_i\}, each mapped to a KB embedding eiRDKB\mathbf{e}_i \in \mathbb{R}^{D_\mathrm{KB}} (often DKB=512D_\mathrm{KB}=512 using RotatE/Wikidata).
  • Entity Representation: The document-level entity embedding is hKB(d)=1ni=1nei\mathbf{h}_\mathrm{KB}(d)=\frac{1}{n}\sum_{i=1}^{n} \mathbf{e}_i.

The two representations are concatenated into z(d)=[hLLM(d);hKB(d)]RDLLM+DKB\mathbf{z}(d)=[\mathbf{h}_\mathrm{LLM}(d);\,\mathbf{h}_\mathrm{KB}(d)] \in \mathbb{R}^{D_\mathrm{LLM}+D_\mathrm{KB}}.

  • Dimensionality Reduction: Truncated SVD is applied for low-dimensional projection: z=VkTz\mathbf{z}' = V_k^T \mathbf{z} where VkV_k contains the top kk principal components.
  • Classification: z\mathbf{z}' is input to an AutoML pipeline (TPOT), automatically searching and tuning among a broad class of estimators (logistic regression, random forests, SVMs, etc.), feature selectors, and preprocessors.

Functional Data (Spatial Depth – WMD+KFSD):

For the classification of functional data (curves/functions), the FT+KB label designates the WMD+KFSD procedure, which employs kernelized functional spatial depth (Sguera et al., 2013):

  • Functional Spatial Depth (FSD):

FSD(x;P)=1E[FS(xY)]\mathrm{FSD}(x;P) = 1 - \|\mathbb{E}[FS(x-Y)]\|

for random function YY in Hilbert space H\mathbb{H}, with FS(x)=x/xFS(x) = x/\|x\| for x0x \neq 0.

  • Kernelized Extension (KFSD):

Functions are embedded into a feature space via kernel κ(x,y)\kappa(x,y) (usually Gaussian), and depth is computed as

KFSD(x;P)=1E[FS(ϕ(x)ϕ(Y))]\mathrm{KFSD}(x; P) = 1 - \|\mathbb{E}[FS(\phi(x)-\phi(Y))]\|

with ϕ\phi the implicit feature mapping.

  • Classification Rule: For labeled groups g{0,1}g \in \{0,1\}, assign xx to the group with the larger within-group depth Dg(x)=KFSDng(x;{yi ⁣: ⁣gi=g})D_g(x) = \mathrm{KFSD}_{n_g}(x;\{y_i\!:\!g_i = g\}).

2. Methodology and Implementation

LLM+KB Fusion Implementation:

  • Embeddings: HuggingFace transformers, OpenAI API, GraphVite (KB lookup);
  • Entity Linking: Babelfy API.
  • Dimensionality Reduction: Scikit-learn SVD.
  • Classification: TPOT, with hyperparameters (population=100, generations=100, 5-fold CV).
  • Resource Footprint: Up to 1 hour runtime, 16 cores, 256 GB RAM per run.

Functional KFSD:

  • Kernel Choice: Gaussian kernel with bandwidth σ\sigma selected as a percentile of pairwise curve distances; 7 candidate percentiles (15–85%).
  • KFSD Computation: Matrix-based approach on Gram matrices for efficiency; see pseudocode for details (Sguera et al., 2013).
  • Classifier Variants: Distance to Trimmed Mean (DTM), Weighted Average Distance (WAD), but WMD+KFSD consistently yields top performance.

3. Empirical Performance and Benchmarks

Document FT+KB:

  • Datasets: Books, DVD, Music (binary sentiment), Hate speech, MLDoc (4-way), XGENRE (9-way).
  • Baselines: Ridge-penalized classifiers on pure LLM embeddings.
  • Results: Average accuracy gain +0.52% (Wilcoxon p=0.01p=0.01); largest gains with Angle (+2.25 pp), mxbai (+1.50 pp), and LLaMa3 (+0.63 pp); OpenAI decreased marginally on hate speech.
  • Compression: Low-dimensional SVD projections (k ⁣ ⁣512k\!\leq\!512) generally suffice; on some datasets k=2k=2 matches high-dimensional accuracy.
  • Few-Shot: Maintains parity or superiority to text-only baselines with 1–50% training data.
Dataset Baseline (%) FT+KB (%)
Books 93.85 95.40
DVD 94.15 94.95
Music 91.65 94.25
Hate 79.06 81.62
MLDoc 95.42 95.90
XGENRE 53.67 59.19

Functional FT+KB (WMD+KFSD):

  • Simulations: Across multiple curve-generating processes (with/without outliers, linear/sinusoidal), WMD+KFSD matches or exceeds the global-depth and kk-NN approaches, especially when data groups are subtle or contaminated.
  • Real Data:
    • Growth: WMD+KFSD error 3.45% (T1), 2.16% (T2 leave-one-out) vs kk-NN 3.86%, 3.23%.
    • Phoneme: WMD+KFSD error 19.3% (T1), 18.5% (T2) vs kk-NN 22.1%, 22.5%.
  • Robustness: Cross-validated bandwidth selection for kernel ensures adaptability to multimodal and noisy settings.

4. Theoretical and Algorithmic Properties

Statistical Properties:

  • LLM+KB Fusion: Concatenation is a non-parametric, information‐augmenting operation; no end-to-end gradient flow between embedding and classifier. All classifier optimization is downstream of feature generation and SVD.
  • WMD+KFSD: KFSD provides a local, kernel-sensitive depth that robustly distinguishes functional modalities, outperforming global depths especially under contamination or overlapping group structure.

Dimensionality Reduction:

  • SVD Choice: SVD projects both LLM and KB representations jointly, exposing latent axes that maximize variance relevant for downstream AutoML.
  • Bandwith Selection in KFSD: Bandwidth σ\sigma is discrete and selected via cross-validation, with computational overhead O(n2)O(n^2) manageable given small sample sizes typical in FDA.

5. Practical Implementation and Tooling

Codebases and Libraries:

  • LLM+KB: Source code (bablfusion) is provided; relies on HuggingFace, OpenAI API, Babelfy API, GraphVite, scikit-learn, and TPOT (Koloski et al., 2024).
  • KFSD: Functional depth and classifier construction as described in (Sguera et al., 2013); efficient algorithms rely on Gram matrix precomputation and vectorized operations.

Resource Requirements:

  • LLM+KB: Multi-core CPUs, large RAM for embedding and SVD. Execution time <1<1h per experiment.
  • WMD+KFSD: Modest CPU loads; full simulation for n=50n=50 curves with bandwidth search under 2s/sample.

6. Significance, Generalizations, and Outlook

The FT+KB paradigm demonstrates that explicit fusion of unstructured (LLM) and structured (KB or kernel-induced) information materially enhances classification outcomes across both textual and functional domains. Empirically, these approaches maintain performance—often with lower feature dimensionality and robustly in few-shot regimes—relative to strong baselines.

A plausible implication is that the explicit insertion of external knowledge or local kernel structure, followed by informed dimensionality reduction, yields generalizable gains for heterogeneous data. For document classification, this involves LLM grounding via KB entities; in functional data, local kernel depth metrics offer discriminativity when group differences are marginal or noise is present.

The methodology is extensible, with the potential for adaptation to multimodal or hierarchical KBs, as well as for integrating end-to-end differentiability as LLMs and graph neural KBs become more tightly coupled. The FT+KB class thus represents a principled and empirically validated approach for robust, high‐performance classification leveraging both dense and structured knowledge representations (Koloski et al., 2024, Sguera et al., 2013).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FT+KB Classifier.