FT+KB Classifier Overview

Updated 5 January 2026

FT+KB classifier is a hybrid technique combining fine-tuned LLM embeddings with knowledge base entity representations to boost classification accuracy.
It uses truncated SVD for dimensionality reduction and an AutoML pipeline to optimize feature extraction and model selection.
Applied to both document and functional data, it demonstrates robust performance improvements even in few-shot and noisy settings.

The FT+KB classifier refers to a class of machine learning techniques integrating fine-tuned representations from large neural models (FT: Fine-Tuned, typically LLMs) with explicit knowledge-bases (KB: Knowledge Base), or, alternatively, functional depth-based approaches leveraging both functional and kernel-based information. These classifiers combine heterogeneous feature sources to enhance classification efficacy under heterogeneous data regimes, including document, tabular, and functional modalities.

1. Architectures and Formalization

Two distinct but rigorously grounded FT+KB classifier families have been advanced.

Document Classification (LLM+KB Fusion):

This architecture leverages both fine-tuned LLM embeddings and knowledge-base-derived entity embeddings for document representation (Koloski et al., 2024). Each input document $d$ is simultaneously processed:

LLM Encoder: $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ ; e.g., $D_\mathrm{LLM}=1024$ (Angle), $4096$ (LLM2Vec-LLaMa3), $1536/3072$ (OpenAI).
Entity Linking: An external entity linker (Babelfy) extracts entities $\{e_i\}$ , each mapped to a KB embedding $\mathbf{e}_i \in \mathbb{R}^{D_\mathrm{KB}}$ (often $D_\mathrm{KB}=512$ using RotatE/Wikidata).
Entity Representation: The document-level entity embedding is $\mathbf{h}_\mathrm{KB}(d)=\frac{1}{n}\sum_{i=1}^{n} \mathbf{e}_i$ .

The two representations are concatenated into $\mathbf{z}(d)=[\mathbf{h}_\mathrm{LLM}(d);\,\mathbf{h}_\mathrm{KB}(d)] \in \mathbb{R}^{D_\mathrm{LLM}+D_\mathrm{KB}}$ .

Dimensionality Reduction: Truncated SVD is applied for low-dimensional projection: $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 0 where $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 1 contains the top $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 2 principal components.
Classification: $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 3 is input to an AutoML pipeline (TPOT), automatically searching and tuning among a broad class of estimators (logistic regression, random forests, SVMs, etc.), feature selectors, and preprocessors.

Functional Data (Spatial Depth – WMD+KFSD):

For the classification of functional data (curves/functions), the FT+KB label designates the WMD+KFSD procedure, which employs kernelized functional spatial depth (Sguera et al., 2013):

Functional Spatial Depth (FSD):

$d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 4

for random function $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 5 in Hilbert space $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 6, with $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 7 for $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 8.

Kernelized Extension (KFSD):

Functions are embedded into a feature space via kernel $d \mapsto \mathbf{h}_\mathrm{LLM}(d) \in \mathbb{R}^{D_\mathrm{LLM}}$ 9 (usually Gaussian), and depth is computed as

$D_\mathrm{LLM}=1024$ 0

with $D_\mathrm{LLM}=1024$ 1 the implicit feature mapping.

Classification Rule: For labeled groups $D_\mathrm{LLM}=1024$ 2, assign $D_\mathrm{LLM}=1024$ 3 to the group with the larger within-group depth $D_\mathrm{LLM}=1024$ 4.

2. Methodology and Implementation

LLM+KB Fusion Implementation:

Embeddings: HuggingFace transformers, OpenAI API, GraphVite (KB lookup);
Entity Linking: Babelfy API.
Dimensionality Reduction: Scikit-learn SVD.
Classification: TPOT, with hyperparameters (population=100, generations=100, 5-fold CV).
Resource Footprint: Up to 1 hour runtime, 16 cores, 256 GB RAM per run.

Functional KFSD:

Kernel Choice: Gaussian kernel with bandwidth $D_\mathrm{LLM}=1024$ 5 selected as a percentile of pairwise curve distances; 7 candidate percentiles (15–85%).
KFSD Computation: Matrix-based approach on Gram matrices for efficiency; see pseudocode for details (Sguera et al., 2013).
Classifier Variants: Distance to Trimmed Mean (DTM), Weighted Average Distance (WAD), but WMD+KFSD consistently yields top performance.

3. Empirical Performance and Benchmarks

Document FT+KB:

Datasets: Books, DVD, Music (binary sentiment), Hate speech, MLDoc (4-way), XGENRE (9-way).
Baselines: Ridge-penalized classifiers on pure LLM embeddings.
Results: Average accuracy gain +0.52% (Wilcoxon $D_\mathrm{LLM}=1024$ 6); largest gains with Angle (+2.25 pp), mxbai (+1.50 pp), and LLaMa3 (+0.63 pp); OpenAI decreased marginally on hate speech.
Compression: Low-dimensional SVD projections ( $D_\mathrm{LLM}=1024$ 7) generally suffice; on some datasets $D_\mathrm{LLM}=1024$ 8 matches high-dimensional accuracy.
Few-Shot: Maintains parity or superiority to text-only baselines with 1–50% training data.

Dataset	Baseline (%)	FT+KB (%)
Books	93.85	95.40
DVD	94.15	94.95
Music	91.65	94.25
Hate	79.06	81.62
MLDoc	95.42	95.90
XGENRE	53.67	59.19

Functional FT+KB (WMD+KFSD):

Simulations: Across multiple curve-generating processes (with/without outliers, linear/sinusoidal), WMD+KFSD matches or exceeds the global-depth and $D_\mathrm{LLM}=1024$ 9-NN approaches, especially when data groups are subtle or contaminated.
Real Data:
- Growth: WMD+KFSD error 3.45% (T1), 2.16% (T2 leave-one-out) vs $4096$0-NN 3.86%, 3.23%.
- Phoneme: WMD+KFSD error 19.3% (T1), 18.5% (T2) vs $4096$1-NN 22.1%, 22.5%.
Robustness: Cross-validated bandwidth selection for kernel ensures adaptability to multimodal and noisy settings.

4. Theoretical and Algorithmic Properties

Statistical Properties:

LLM+KB Fusion: Concatenation is a non-parametric, information‐augmenting operation; no end-to-end gradient flow between embedding and classifier. All classifier optimization is downstream of feature generation and SVD.
WMD+KFSD: KFSD provides a local, kernel-sensitive depth that robustly distinguishes functional modalities, outperforming global depths especially under contamination or overlapping group structure.

Dimensionality Reduction:

SVD Choice: SVD projects both LLM and KB representations jointly, exposing latent axes that maximize variance relevant for downstream AutoML.
Bandwith Selection in KFSD: Bandwidth $4096$2 is discrete and selected via cross-validation, with computational overhead $4096$3 manageable given small sample sizes typical in FDA.

5. Practical Implementation and Tooling

Codebases and Libraries:

LLM+KB: Source code (bablfusion) is provided; relies on HuggingFace, OpenAI API, Babelfy API, GraphVite, scikit-learn, and TPOT (Koloski et al., 2024).
KFSD: Functional depth and classifier construction as described in (Sguera et al., 2013); efficient algorithms rely on Gram matrix precomputation and vectorized operations.

Resource Requirements:

LLM+KB: Multi-core CPUs, large RAM for embedding and SVD. Execution time $4096$4h per experiment.
WMD+KFSD: Modest CPU loads; full simulation for $4096$5 curves with bandwidth search under 2s/sample.

6. Significance, Generalizations, and Outlook

The FT+KB paradigm demonstrates that explicit fusion of unstructured (LLM) and structured (KB or kernel-induced) information materially enhances classification outcomes across both textual and functional domains. Empirically, these approaches maintain performance—often with lower feature dimensionality and robustly in few-shot regimes—relative to strong baselines.

A plausible implication is that the explicit insertion of external knowledge or local kernel structure, followed by informed dimensionality reduction, yields generalizable gains for heterogeneous data. For document classification, this involves LLM grounding via KB entities; in functional data, local kernel depth metrics offer discriminativity when group differences are marginal or noise is present.

The methodology is extensible, with the potential for adaptation to multimodal or hierarchical KBs, as well as for integrating end-to-end differentiability as LLMs and graph neural KBs become more tightly coupled. The FT+KB class thus represents a principled and empirically validated approach for robust, high‐performance classification leveraging both dense and structured knowledge representations (Koloski et al., 2024, Sguera et al., 2013).

Markdown Report Issue Upgrade to Chat

References (2)

AutoML-guided Fusion of Entity and LLM-based Representations for Document Classification (2024)

Spatial Depth-Based Classification for Functional Data (2013)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FT+KB Classifier.

FT+KB Classifier Overview

1. Architectures and Formalization

2. Methodology and Implementation

3. Empirical Performance and Benchmarks

4. Theoretical and Algorithmic Properties

5. Practical Implementation and Tooling

6. Significance, Generalizations, and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FT+KB Classifier Overview

1. Architectures and Formalization

2. Methodology and Implementation

3. Empirical Performance and Benchmarks

4. Theoretical and Algorithmic Properties

5. Practical Implementation and Tooling

6. Significance, Generalizations, and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research