LLM-TOPSIS Framework
- The paper presents the LLM-TOPSIS framework that integrates fine-tuned transformer models with fuzzy TOPSIS to rank candidate profiles using both structured and unstructured data.
- The methodology converts NLP-derived proficiency labels into numerical scores and triangular fuzzy numbers, forming a decision matrix for comprehensive multi-criteria evaluation.
- Empirical results demonstrate high performance with 91%+ accuracy and near-perfect alignment with expert rankings, underscoring the framework's potential in enhancing recruitment processes.
The LLM-TOPSIS framework is an integrated system that combines LLM natural language processing with a fuzzy extension of the Technique for Order Preference by Similarity to Ideal Solution (Fuzzy-TOPSIS), applied to the automated, multi-criteria ranking of personnel profiles in software engineering recruitment. The methodology is designed to operationalize both structured expert knowledge and the nuanced, unstructured data found in LinkedIn profiles by leveraging fine-tuned transformer models as scoring front-ends and a fuzzy multi-criteria decision-making (MCDM) backend operating on triangular fuzzy numbers (TFNs) (Hoque et al., 30 Jan 2026).
1. System Architecture and Workflow
The LLM-TOPSIS system ingests a set of LinkedIn profiles, each with four key textual fields: Experience, Skills, Education, and About (self-introduction). The primary workflow consists of the following steps:
- Fine-tuned DistilRoBERTa Multi-class Classification: For each field, a DistilRoBERTa model is fine-tuned to predict one of three proficiency labels—Poor, Fair, or Excellent.
- Label-to-Score Mapping: Predicted labels are mapped to numerical scores: 1–2 (Poor), 3 (Fair), 4–5 (Excellent).
- Matrix Construction: Scores are organized into a numeric decision matrix of shape .
- Fuzzy TOPSIS Application: The decision matrix is transformed using TFNs for both criteria weights and candidate scores. Fuzzy-TOPSIS is then applied to produce an overall candidate ranking.
DistilRoBERTa acts as the quantitative interpreter of unstructured profile data, while the fuzzy-TOPSIS backend aggregates the resulting scores under explicit modeling of linguistic and subjective uncertainty.
2. Mathematical Preliminaries and Notation
2.1 Triangular Fuzzy Numbers (TFNs)
A TFN is specified as , where are lower, modal, and upper bounds. The membership function increases linearly from to and decreases linearly from to .
2.2 Linguistic-to-TFN Mapping
Candidate attribute labels and criteria weights are converted to TFNs via predefined mappings. For example, the translation from linguistic term to TFN is as follows:
| Linguistic Term | TFN |
|---|---|
| Very Low | (0.0, 0.1, 0.3) |
| Low | (0.1, 0.3, 0.5) |
| Medium | (0.3, 0.5, 0.7) |
| High | (0.5, 0.7, 0.9) |
| Very High | (0.7, 0.9, 1.0) |
Criteria weights are specified as for Experience, Skills, Education, About, and candidate scores as via interval or linguistic mappings.
3. Fuzzy TOPSIS Computation
3.1 Fuzzy Decision Matrix and Weights
The fuzzy decision matrix is , and the fuzzy weight vector , with criteria.
3.2 Fuzzy Normalization
Each criterion is normalized (for benefit attributes) as:
3.3 Weighted Normalized Decision Matrix
Elementwise fuzzy multiplication yields:
where .
3.4 Ideal Solutions
- Fuzzy positive ideal:
- Fuzzy negative ideal: where .
3.5 Fuzzy Distance Measures
The vertex method computes distance between TFNs:
for each candidate :
3.6 Closeness Coefficient
The closeness coefficient is then
Defuzzification may be performed with the centroid method . Higher values indicate more preferred candidates.
4. DistilRoBERTa LLM for Textual Attribute Scoring
The DistilRoBERTa LLM is fine-tuned separately per attribute (Experience, Skills, Education, About) on a dataset of 100 expert-labeled profiles, expanded to 10,000 samples per attribute via data augmentation (paraphrasing, synonym substitution). Key parameters include:
- Model: distilroberta-base (6 layers, 82M parameters)
- Training: 18 epochs, learning rate, batch size 16, max sequence length 256
- Labels: 3 classes (Poor, Fair, Excellent)
- Loss: cross-entropy with knowledge distillation from a RoBERTa teacher
The model predicts a class for each profile field, which is mapped to a numeric score , then to a TFN either by a small symmetric interval around or via a linguistic-to-TFN lexicon.
5. Algorithmic Summary
The LLM-TOPSIS ranking pipeline executes as follows:
- For each candidate :
- For each criterion skill, exp, edu, about:
- Compute (class)
- Map class to (numeric score), then to (TFN)
- Assemble
- For each criterion skill, exp, edu, about:
- Construct the decision matrix
- Normalize and apply fuzzy weights
- Compute , , , for each candidate
- Compute
- Rank candidates by descending values
6. Empirical Evaluation
6.1 DistilRoBERTa Classification Performance
- Experience attribute: 91% accuracy (Precision = 0.95/1.00/0.99, Recall = 1.00/0.36/0.99 for Poor/Fair/Excellent)
- Overall attribute: 91% accuracy ( 1.00/0.87/0.85)
6.2 Fuzzy-TOPSIS Ranking Quality
Using DistilRoBERTa-generated scores:
- Mean Average Precision (MAP): 0.99
- Normalized Discounted Cumulative Gain (NDCG): 0.926
- Mean Reciprocal Rank (MRR): 0.999
- Root Mean Square Error (RMSE): 0.043
- Mean Absolute Error (MAE): 0.036
- Cosine similarity: 0.983
Comparative analysis with human expert rankings yields cosine similarity of 0.981 and NDCG of 0.911. In a sample of 10 senior software engineering candidates, the system's rankings exhibited top-spot agreement with the expert panel and achieved cosine similarity 0.98 with human rankings, indicating a high degree of alignment.
7. Significance and Future Prospects
The LLM-TOPSIS approach demonstrates the viability of combining transformer-based profile assessment with a fuzzy logic MCDM framework for personnel selection tasks. Its capacity to encode and reason with subjectivity and imprecision in candidate evaluation is evidenced by empirical results: classification accuracy of ≥91% on key attributes and near-perfect concordance with human expert rankings. The framework enhances recruitment by supporting scalability, consistency, and minimization of bias. Proposed future directions include dataset expansion, improved interpretability, and validation in live recruitment scenarios to assess practical impact and robustness (Hoque et al., 30 Jan 2026).