DermETAS-SNA LLM Dermatology Assistant

Updated 16 December 2025

DermETAS-SNA LLM Assistant is a dermatology-focused AI system that integrates an evolutionary Vision Transformer search and class-balanced binary classifiers for robust skin disease recognition.
It employs a StackNet ensemble that fuses class probabilities, multi-scale features, and statistical summaries to address class imbalances and enhance predictive accuracy.
A retrieval-augmented generation module leverages a comprehensive dermatology knowledge base and personalized LLM interactions to deliver clear, medically grounded diagnostic explanations.

DermETAS-SNA LLM Assistant is a dermatology-focused artificial intelligence system that integrates evolutionary Vision Transformer (ViT) search, class-imbalanced skin disease classification, a multi-level ensemble meta-classifier (StackNet), and retrieval-augmented generation (RAG) using a LLM assistant. It is specifically engineered to provide high-performance skin disease recognition and to deliver medically grounded, explainable diagnostic descriptions suitable for clinician-patient interaction and clinical education (Oruganty et al., 9 Dec 2025, Zhang et al., 2023).

1. Evolutionary Transformer Architecture Search (ETAS)

DermETAS-SNA’s visual inference backbone is discovered via an Evolutionary Transformer Architecture Search (ETAS) executed on the SKINCON dataset. The search space consists of ViT parameterizations with variable transformer depth ( $n \in \{6,7,\ldots,12\}$ ), attention head count per layer ( $h_i \in \{8,16\}$ ), MLP dimension per layer ( $m_i \in \{2048, 3072, 4096\}$ ), and dropout rate ( $d_i \in [0.1, 0.3]$ ). Patch size (16×16), image resolution (224×224), and embedding dimension (768) are kept fixed.

Individuals in the population encode the ViT as tuples of layer parameters $I = (L_1, L_2, ..., L_n)$ with $L_i = (h_i, m_i, d_i)$ . Genetic operators include random initialization (population size 5), tournament parent selection, roulette-wheel selection with elitism, single-point crossover ( $p_{xoss} = 0.8$ ), and mutation (add/remove/perturb a layer; rates $[0.7, 0.2, 0.1]$ at $p_{mut} = 0.2$ ). Fitness is the 5-fold averaged F1 score on SKINCON (3 886 images, 48 fine-grained concept labels):

$\text{Fitness}(I) = \frac{1}{5} \sum_{k=1}^5 F1^{(k)}_\text{score}(\text{TrainAndEvaluate}(I))$

After 20 generations, the optimal backbone was a 12-layer, 16-head ViT with MLP dimensions up to 4096 and dropout 0.1–0.2, frozen for use in downstream classifier modules. This search ensures dermatology-specific representation learning robust to complex lesion morphologies (Oruganty et al., 9 Dec 2025).

2. Disease Classification via Fine-Tuned Binary Classifiers

Leveraging the ETAS-ViT, DermETAS-SNA constructs a one-vs-all binary classifier ensemble on the DermNet dataset (19 500 images, 23 categories such as Melanoma, Psoriasis, Actinic Keratosis). For each disease class $c$ , a dataset $D_c$ is balanced to 1:1 using under/oversampling, and the ViT backbone’s head replaced with a single-neuron sigmoid output. Two fine-tuning strategies are employed:

Full Unfreezing (FU): all weights trainable from initiation
Gradual Unfreezing (GU): only head is trainable at first, unfreezing earlier layers epoch by epoch

Training covers grids of learning rates ( $\{1\times10^{-5}, 5\times10^{-5}, 1\times10^{-4}\}$ ), batch sizes ( $\{16, 32\}$ ), and momentum, with up to 50 epochs and early stopping by validation F1. Binary cross-entropy is used, with Focal Loss at the meta-classification stage. Standard dermoscopy augmentations (flips, $\pm$ 15° rotations, color jitter) are applied. Class classifiers $M_c$ are thus specialized for both rare and common skin conditions, mitigating class imbalance (Oruganty et al., 9 Dec 2025).

3. StackNet Augmented Ensemble and Meta-Classification

At inference, the primary classification ensemble proceeds in two stages ("StackNet" Editor's term). In Level 1, the 23 ViT binary classifiers output $P(x) = [p_1(x), ..., p_{23}(x)] \in \mathbb{R}^{23}$ . In Level 2, a meta-classifier $M_\text{meta}$ receives a concatenated feature vector $F(x)$ that fuses:

$P(x)$ (class probabilities)
$D_\text{multi}(x)$ : multi-scale features (2048-dim) via global average pooling of four intermediate layers from a ResNet-50 pretrained on ImageNet
$S(x)$ : summary statistics (mean, std, max, top-3 mean of $P(x)$ )

This yields $F(x) \in \mathbb{R}^{2075}$ . The meta-classifier is a 1D-CNN (three fully-connected layers: 1024→512→256) trained with Focal Loss. The predicted class is:

$\hat{y} = \arg\max_{c} M_\text{meta}(F(x))_c$

This StackNet paradigm allows robust aggregation of weak binary classifiers, explicit integration of multi-scale features, and statistical summaries to improve predictive accuracy and handle class imbalance (Oruganty et al., 9 Dec 2025).

4. RAG-Based Diagnostic Explanation and LLM Personalization

DermETAS-SNA incorporates a Diagnostic Explanation and Retrieval Model for Dermatology (DERM-RAG) utilizing a comprehensive, LLM-assisted pipeline:

Knowledge Base: Formed from 5 000 pages of dermatological textbooks, parsed into ~10 000 thematic chunks, embedded via Qwen2-1.5B-Instruct, stored with QdrantDB for vector retrieval.
Query Formulation: Detected features (e.g., papule, erythema) and predicted labels ground natural-language queries.
Passage Retrieval and Reranking: Qwen2-1.5B embedding locates top-K similar passages; Cohere rerank-v3.5 orders them by cross-encoder relevance.
LLM Prompting: Gemini 2.5 Pro receives a template prompt with up to 2 000 context tokens. System instructions clarify expert role; user prompts request plain-language explanations and treatment options. The LLM’s response is scored via beam-search likelihood, and low-confidence outputs ( $\text{LL} < \text{threshold}$ ) are filtered or re-queried.

To further enhance personalization, a dual-process bionic memory inspired by neuroscience is implemented (Zhang et al., 2023). Working Memory (WM) stores per-turn dialogue notes, Short-Term Memory (STM) tracks recent patient-specific attributes (key–value pairs with learned embeddings), and Long-Term Memory (LTM) contains domain-wide dermatology knowledge (e.g., guidelines, ontologies):

STM retrieval uses $\ell_2$ distance for embedding similarity; entries are consolidated to LTM after repeated access.
LTM retrieval uses cosine similarity.
Only frequently accessed STM entries ( $\geq \theta$ times) are promoted to LTM, emulating consolidation in human memory.

Parameter-efficient fine-tuning (PEFT) is achieved with LoRA adapters and per-layer domain adapters, permitting adaptation with $\ll 1\%$ of model parameters. Training optimizes causal language modeling loss, an adapter consistency term to prevent catastrophic forgetting, and an $\ell_2$ penalty on LoRA updates.

5. Experimental Performance and Clinical Evaluation

On DermNet (23 classes), the system demonstrates significant performance gains versus the SkinGPT-4 baseline:

Metric	DermETAS-SNA	SkinGPT-4
Accuracy	59.89%	52.92%
Precision	59.25%	54.57%
Recall	55.29%	46.83%
Macro F1-Score	56.30%	48.51%
Matthews CC	0.57	0.50

Macro-F1 increases by 16.06 percentage points, with per-class F1 ranging from 44.52% (Psoriasis) to 74.16% (Melanoma). Experimental evaluation confirms robust superiority (see Table 2 in (Oruganty et al., 9 Dec 2025)).

RAG-based LLM outputs were evaluated by eight licensed dermatologists on seven representative conditions, with 92% overall expert agreement on six axes (accuracy, clarity, actionability, etc.), compared to 48.2% agreement for SkinGPT-4.

6. Proof-of-Concept Implementation and System Architecture

A real-world prototype demonstrates seamless integration of DermETAS-SNA. The front-end uses Streamlit; the back-end is deployed on an Apple Mac Studio (M2 Max, 38-core GPU). User workflow proceeds as:

Dermoscopy image upload (JPEG/PNG)
ETAS-ViT feature extraction and disease prediction via 23 binary classifiers + StackNet meta-classifier
Extracted features and prediction seed a RAG query; Qdrant/Cohere retrieve and reorder knowledge base passages
Gemini 2.5 Pro produces a concise, medically informed diagnostic description
Interactive Q&A allows clarification and follow-up

Communication is REST-based, providing modularity, scalability, and updatable components (Oruganty et al., 9 Dec 2025).

7. Practical Guidelines and System Trade-Offs

Memory configuration for STM (≈50 entries, embedding dim ≈768, consolidation threshold of 3 accesses, pruning every 5 dialogue turns) balances personalization and latency. Increasing STM to 100 entries yields a 15% retrieval latency increase. LoRA rank ( $r_\ell=8$ ) is preferred for efficiency; $r_\ell=16$ offers slightly better fluency with doubled parameter cost. Domain adapter rank ( $r_d=64$ vs. 32) halves adapter training time at a minor quality drop (~1 ROUGE point).

Evaluation benchmarks emphasize accuracy (ROUGE-L), preference adaptation, guideline adherence (safety), user win-rate (A/B human judgment), and system throughput (≤1 s response, ≥10 req/s on a 32 GB GPU) (Zhang et al., 2023).

DermETAS-SNA LLM Assistant combines optimized visual representations, class-imbalanced ensemble learning, retrieval-augmented explanation, and personalized LLM interaction using computationally efficient, modular architectures. It achieves substantial empirical gains in both diagnostic accuracy and clinical trustworthiness, establishing a pathway for next-generation, interpretable AI in dermatological medicine (Oruganty et al., 9 Dec 2025, Zhang et al., 2023).