DermETAS-SNA LLM Dermatology Assistant
- DermETAS-SNA LLM Assistant is a dermatology-focused AI system that integrates an evolutionary Vision Transformer search and class-balanced binary classifiers for robust skin disease recognition.
- It employs a StackNet ensemble that fuses class probabilities, multi-scale features, and statistical summaries to address class imbalances and enhance predictive accuracy.
- A retrieval-augmented generation module leverages a comprehensive dermatology knowledge base and personalized LLM interactions to deliver clear, medically grounded diagnostic explanations.
DermETAS-SNA LLM Assistant is a dermatology-focused artificial intelligence system that integrates evolutionary Vision Transformer (ViT) search, class-imbalanced skin disease classification, a multi-level ensemble meta-classifier (StackNet), and retrieval-augmented generation (RAG) using a LLM assistant. It is specifically engineered to provide high-performance skin disease recognition and to deliver medically grounded, explainable diagnostic descriptions suitable for clinician-patient interaction and clinical education (Oruganty et al., 9 Dec 2025, Zhang et al., 2023).
1. Evolutionary Transformer Architecture Search (ETAS)
DermETAS-SNA’s visual inference backbone is discovered via an Evolutionary Transformer Architecture Search (ETAS) executed on the SKINCON dataset. The search space consists of ViT parameterizations with variable transformer depth (), attention head count per layer (), MLP dimension per layer (), and dropout rate (). Patch size (16×16), image resolution (224×224), and embedding dimension (768) are kept fixed.
Individuals in the population encode the ViT as tuples of layer parameters with . Genetic operators include random initialization (population size 5), tournament parent selection, roulette-wheel selection with elitism, single-point crossover (), and mutation (add/remove/perturb a layer; rates at ). Fitness is the 5-fold averaged F1 score on SKINCON (3 886 images, 48 fine-grained concept labels):
After 20 generations, the optimal backbone was a 12-layer, 16-head ViT with MLP dimensions up to 4096 and dropout 0.1–0.2, frozen for use in downstream classifier modules. This search ensures dermatology-specific representation learning robust to complex lesion morphologies (Oruganty et al., 9 Dec 2025).
2. Disease Classification via Fine-Tuned Binary Classifiers
Leveraging the ETAS-ViT, DermETAS-SNA constructs a one-vs-all binary classifier ensemble on the DermNet dataset (19 500 images, 23 categories such as Melanoma, Psoriasis, Actinic Keratosis). For each disease class , a dataset is balanced to 1:1 using under/oversampling, and the ViT backbone’s head replaced with a single-neuron sigmoid output. Two fine-tuning strategies are employed:
- Full Unfreezing (FU): all weights trainable from initiation
- Gradual Unfreezing (GU): only head is trainable at first, unfreezing earlier layers epoch by epoch
Training covers grids of learning rates (), batch sizes (), and momentum, with up to 50 epochs and early stopping by validation F1. Binary cross-entropy is used, with Focal Loss at the meta-classification stage. Standard dermoscopy augmentations (flips, 15° rotations, color jitter) are applied. Class classifiers are thus specialized for both rare and common skin conditions, mitigating class imbalance (Oruganty et al., 9 Dec 2025).
3. StackNet Augmented Ensemble and Meta-Classification
At inference, the primary classification ensemble proceeds in two stages ("StackNet" Editor's term). In Level 1, the 23 ViT binary classifiers output . In Level 2, a meta-classifier receives a concatenated feature vector that fuses:
- (class probabilities)
- : multi-scale features (2048-dim) via global average pooling of four intermediate layers from a ResNet-50 pretrained on ImageNet
- : summary statistics (mean, std, max, top-3 mean of )
This yields . The meta-classifier is a 1D-CNN (three fully-connected layers: 1024→512→256) trained with Focal Loss. The predicted class is:
This StackNet paradigm allows robust aggregation of weak binary classifiers, explicit integration of multi-scale features, and statistical summaries to improve predictive accuracy and handle class imbalance (Oruganty et al., 9 Dec 2025).
4. RAG-Based Diagnostic Explanation and LLM Personalization
DermETAS-SNA incorporates a Diagnostic Explanation and Retrieval Model for Dermatology (DERM-RAG) utilizing a comprehensive, LLM-assisted pipeline:
- Knowledge Base: Formed from 5 000 pages of dermatological textbooks, parsed into ~10 000 thematic chunks, embedded via Qwen2-1.5B-Instruct, stored with QdrantDB for vector retrieval.
- Query Formulation: Detected features (e.g., papule, erythema) and predicted labels ground natural-language queries.
- Passage Retrieval and Reranking: Qwen2-1.5B embedding locates top-K similar passages; Cohere rerank-v3.5 orders them by cross-encoder relevance.
- LLM Prompting: Gemini 2.5 Pro receives a template prompt with up to 2 000 context tokens. System instructions clarify expert role; user prompts request plain-language explanations and treatment options. The LLM’s response is scored via beam-search likelihood, and low-confidence outputs () are filtered or re-queried.
To further enhance personalization, a dual-process bionic memory inspired by neuroscience is implemented (Zhang et al., 2023). Working Memory (WM) stores per-turn dialogue notes, Short-Term Memory (STM) tracks recent patient-specific attributes (key–value pairs with learned embeddings), and Long-Term Memory (LTM) contains domain-wide dermatology knowledge (e.g., guidelines, ontologies):
- STM retrieval uses distance for embedding similarity; entries are consolidated to LTM after repeated access.
- LTM retrieval uses cosine similarity.
- Only frequently accessed STM entries ( times) are promoted to LTM, emulating consolidation in human memory.
Parameter-efficient fine-tuning (PEFT) is achieved with LoRA adapters and per-layer domain adapters, permitting adaptation with of model parameters. Training optimizes causal language modeling loss, an adapter consistency term to prevent catastrophic forgetting, and an penalty on LoRA updates.
5. Experimental Performance and Clinical Evaluation
On DermNet (23 classes), the system demonstrates significant performance gains versus the SkinGPT-4 baseline:
| Metric | DermETAS-SNA | SkinGPT-4 |
|---|---|---|
| Accuracy | 59.89% | 52.92% |
| Precision | 59.25% | 54.57% |
| Recall | 55.29% | 46.83% |
| Macro F1-Score | 56.30% | 48.51% |
| Matthews CC | 0.57 | 0.50 |
Macro-F1 increases by 16.06 percentage points, with per-class F1 ranging from 44.52% (Psoriasis) to 74.16% (Melanoma). Experimental evaluation confirms robust superiority (see Table 2 in (Oruganty et al., 9 Dec 2025)).
RAG-based LLM outputs were evaluated by eight licensed dermatologists on seven representative conditions, with 92% overall expert agreement on six axes (accuracy, clarity, actionability, etc.), compared to 48.2% agreement for SkinGPT-4.
6. Proof-of-Concept Implementation and System Architecture
A real-world prototype demonstrates seamless integration of DermETAS-SNA. The front-end uses Streamlit; the back-end is deployed on an Apple Mac Studio (M2 Max, 38-core GPU). User workflow proceeds as:
- Dermoscopy image upload (JPEG/PNG)
- ETAS-ViT feature extraction and disease prediction via 23 binary classifiers + StackNet meta-classifier
- Extracted features and prediction seed a RAG query; Qdrant/Cohere retrieve and reorder knowledge base passages
- Gemini 2.5 Pro produces a concise, medically informed diagnostic description
- Interactive Q&A allows clarification and follow-up
Communication is REST-based, providing modularity, scalability, and updatable components (Oruganty et al., 9 Dec 2025).
7. Practical Guidelines and System Trade-Offs
Memory configuration for STM (≈50 entries, embedding dim ≈768, consolidation threshold of 3 accesses, pruning every 5 dialogue turns) balances personalization and latency. Increasing STM to 100 entries yields a 15% retrieval latency increase. LoRA rank () is preferred for efficiency; offers slightly better fluency with doubled parameter cost. Domain adapter rank ( vs. 32) halves adapter training time at a minor quality drop (~1 ROUGE point).
Evaluation benchmarks emphasize accuracy (ROUGE-L), preference adaptation, guideline adherence (safety), user win-rate (A/B human judgment), and system throughput (≤1 s response, ≥10 req/s on a 32 GB GPU) (Zhang et al., 2023).
DermETAS-SNA LLM Assistant combines optimized visual representations, class-imbalanced ensemble learning, retrieval-augmented explanation, and personalized LLM interaction using computationally efficient, modular architectures. It achieves substantial empirical gains in both diagnostic accuracy and clinical trustworthiness, establishing a pathway for next-generation, interpretable AI in dermatological medicine (Oruganty et al., 9 Dec 2025, Zhang et al., 2023).