LLM-TOPSIS Framework

Updated 6 February 2026

The paper presents the LLM-TOPSIS framework that integrates fine-tuned transformer models with fuzzy TOPSIS to rank candidate profiles using both structured and unstructured data.
The methodology converts NLP-derived proficiency labels into numerical scores and triangular fuzzy numbers, forming a decision matrix for comprehensive multi-criteria evaluation.
Empirical results demonstrate high performance with 91%+ accuracy and near-perfect alignment with expert rankings, underscoring the framework's potential in enhancing recruitment processes.

The LLM-TOPSIS framework is an integrated system that combines LLM natural language processing with a fuzzy extension of the Technique for Order Preference by Similarity to Ideal Solution (Fuzzy-TOPSIS), applied to the automated, multi-criteria ranking of personnel profiles in software engineering recruitment. The methodology is designed to operationalize both structured expert knowledge and the nuanced, unstructured data found in LinkedIn profiles by leveraging fine-tuned transformer models as scoring front-ends and a fuzzy multi-criteria decision-making (MCDM) backend operating on triangular fuzzy numbers (TFNs) (Hoque et al., 30 Jan 2026).

1. System Architecture and Workflow

The LLM-TOPSIS system ingests a set of $N$ LinkedIn profiles, each with four key textual fields: Experience, Skills, Education, and About (self-introduction). The primary workflow consists of the following steps:

Fine-tuned DistilRoBERTa Multi-class Classification: For each field, a DistilRoBERTa model is fine-tuned to predict one of three proficiency labels—Poor, Fair, or Excellent.
Label-to-Score Mapping: Predicted labels are mapped to numerical scores: 1–2 (Poor), 3 (Fair), 4–5 (Excellent).
Matrix Construction: Scores are organized into a numeric decision matrix $V$ of shape $N \times 4$ .
Fuzzy TOPSIS Application: The decision matrix is transformed using TFNs for both criteria weights and candidate scores. Fuzzy-TOPSIS is then applied to produce an overall candidate ranking.

DistilRoBERTa acts as the quantitative interpreter of unstructured profile data, while the fuzzy-TOPSIS backend aggregates the resulting scores under explicit modeling of linguistic and subjective uncertainty.

2. Mathematical Preliminaries and Notation

2.1 Triangular Fuzzy Numbers (TFNs)

A TFN is specified as $\tilde{a} = (l, m, u)$ , where $l \leq m \leq u$ are lower, modal, and upper bounds. The membership function $\mu_{\tilde{a}}(x)$ increases linearly from $l$ to $m$ and decreases linearly from $m$ to $u$ .

2.2 Linguistic-to-TFN Mapping

Candidate attribute labels and criteria weights are converted to TFNs via predefined mappings. For example, the translation from linguistic term to TFN is as follows:

Linguistic Term	TFN
Very Low	(0.0, 0.1, 0.3)
Low	(0.1, 0.3, 0.5)
Medium	(0.3, 0.5, 0.7)
High	(0.5, 0.7, 0.9)
Very High	(0.7, 0.9, 1.0)

Criteria weights are specified as $\tilde{w}_j = (l_j, m_j, u_j)$ for $j \in \{$ Experience, Skills, Education, About $\}$ , and candidate scores as $\tilde{x}_{ij} = (l_{ij}, m_{ij}, u_{ij})$ via interval or linguistic mappings.

3. Fuzzy TOPSIS Computation

3.1 Fuzzy Decision Matrix and Weights

The fuzzy decision matrix is $\tilde{X} = [\tilde{x}_{ij}]_{N \times m}$ , and the fuzzy weight vector $\tilde{w} = [\tilde{w}_j]_{1 \times m}$ , with $m=4$ criteria.

3.2 Fuzzy Normalization

Each criterion $j$ is normalized (for benefit attributes) as:

$\tilde{r}_{ij} = \frac{\tilde{x}_{ij}}{\max_i u_{ij}}$

3.3 Weighted Normalized Decision Matrix

Elementwise fuzzy multiplication yields:

$\tilde{v}_{ij} = \tilde{r}_{ij} \otimes \tilde{w}_j$

where $(l_1, m_1, u_1) \otimes (l_2, m_2, u_2) = (l_1 l_2, m_1 m_2, u_1 u_2)$ .

3.4 Ideal Solutions

Fuzzy positive ideal: $\tilde{v}_j^+ = (\max_i l_{ij}', \max_i m_{ij}', \max_i u_{ij}')$
Fuzzy negative ideal: $\tilde{v}_j^- = (\min_i l_{ij}', \min_i m_{ij}', \min_i u_{ij}')$ where $(l_{ij}', m_{ij}', u_{ij}') = \tilde{v}_{ij}$ .

3.5 Fuzzy Distance Measures

The vertex method computes distance between TFNs:

$d(\tilde{a}, \tilde{b}) = \sqrt{\tfrac{1}{3}[(l_a-l_b)^2 + (m_a-m_b)^2 + (u_a-u_b)^2]}$

for each candidate $i$ :

$D_i^+ = \sum_{j=1}^{m} d(\tilde{v}_{ij}, \tilde{v}_j^+), \quad D_i^- = \sum_{j=1}^{m} d(\tilde{v}_{ij}, \tilde{v}_j^-)$

3.6 Closeness Coefficient

The closeness coefficient is then

$CC_i = \frac{D_i^-}{D_i^+ + D_i^-}$

Defuzzification may be performed with the centroid method $C(\tilde{v}_{ij}) = (l + m + u)/3$ . Higher $CC_i$ values indicate more preferred candidates.

4. DistilRoBERTa LLM for Textual Attribute Scoring

The DistilRoBERTa LLM is fine-tuned separately per attribute (Experience, Skills, Education, About) on a dataset of 100 expert-labeled profiles, expanded to 10,000 samples per attribute via data augmentation (paraphrasing, synonym substitution). Key parameters include:

Model: distilroberta-base (6 layers, 82M parameters)
Training: 18 epochs, $1 \times 10^{-5}$ learning rate, batch size 16, max sequence length 256
Labels: 3 classes (Poor, Fair, Excellent)
Loss: cross-entropy with knowledge distillation from a RoBERTa teacher

The model predicts a class $y_{ij}$ for each profile field, which is mapped to a numeric score $s_{ij} \in [1,5]$ , then to a TFN either by a small symmetric interval around $s_{ij}$ or via a linguistic-to-TFN lexicon.

5. Algorithmic Summary

The LLM-TOPSIS ranking pipeline executes as follows:

For each candidate $i=1\dots N$ $i = 1 \dots N$ :
- For each criterion $j\in\{$ $j \in {$ skill, exp, edu, about $\}$ $}$ :
  - Compute $y_{ij} = M_j(x_i^{(j)})$ (class)
  - Map class to $s_{ij}$ (numeric score), then to $\tilde{x}_{ij}$ (TFN)
  - Assemble $\tilde{x}_i = [\tilde{x}_{i1}, \dots, \tilde{x}_{i4}]$
Construct the decision matrix $\tilde{X}$
Normalize $\tilde{X}$ and apply fuzzy weights $\{\tilde{w}_j\}$
Compute $\tilde{v}_j^+$ , $\tilde{v}_j^-$ , $D_i^+$ , $D_i^-$ for each candidate
Compute $CC_i = D_i^-/(D_i^+ + D_i^-)$
Rank candidates by descending $CC_i$ values

6. Empirical Evaluation

6.1 DistilRoBERTa Classification Performance

Experience attribute: 91% accuracy (Precision = 0.95/1.00/0.99, Recall = 1.00/0.36/0.99 for Poor/Fair/Excellent)
Overall attribute: 91% accuracy ( $F_1 \approx$ 1.00/0.87/0.85)

6.2 Fuzzy-TOPSIS Ranking Quality

Using DistilRoBERTa-generated scores:

Mean Average Precision (MAP): 0.99
Normalized Discounted Cumulative Gain (NDCG): 0.926
Mean Reciprocal Rank (MRR): $>$ 0.999
Root Mean Square Error (RMSE): 0.043
Mean Absolute Error (MAE): 0.036
Cosine similarity: 0.983

Comparative analysis with human expert rankings yields cosine similarity of 0.981 and NDCG of 0.911. In a sample of 10 senior software engineering candidates, the system's rankings exhibited top-spot agreement with the expert panel and achieved cosine similarity $>$ 0.98 with human rankings, indicating a high degree of alignment.

7. Significance and Future Prospects

The LLM-TOPSIS approach demonstrates the viability of combining transformer-based profile assessment with a fuzzy logic MCDM framework for personnel selection tasks. Its capacity to encode and reason with subjectivity and imprecision in candidate evaluation is evidenced by empirical results: classification accuracy of ≥91% on key attributes and near-perfect concordance with human expert rankings. The framework enhances recruitment by supporting scalability, consistency, and minimization of bias. Proposed future directions include dataset expansion, improved interpretability, and validation in live recruitment scenarios to assess practical impact and robustness (Hoque et al., 30 Jan 2026).

Markdown Upgrade to Chat

References (1)

When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLM-TOPSIS Framework.