Papers
Topics
Authors
Recent
2000 character limit reached

DermETAS-SNA LLM Dermatology Assistant

Updated 16 December 2025
  • DermETAS-SNA LLM Assistant is a dermatology-focused AI system that integrates an evolutionary Vision Transformer search and class-balanced binary classifiers for robust skin disease recognition.
  • It employs a StackNet ensemble that fuses class probabilities, multi-scale features, and statistical summaries to address class imbalances and enhance predictive accuracy.
  • A retrieval-augmented generation module leverages a comprehensive dermatology knowledge base and personalized LLM interactions to deliver clear, medically grounded diagnostic explanations.

DermETAS-SNA LLM Assistant is a dermatology-focused artificial intelligence system that integrates evolutionary Vision Transformer (ViT) search, class-imbalanced skin disease classification, a multi-level ensemble meta-classifier (StackNet), and retrieval-augmented generation (RAG) using a LLM assistant. It is specifically engineered to provide high-performance skin disease recognition and to deliver medically grounded, explainable diagnostic descriptions suitable for clinician-patient interaction and clinical education (Oruganty et al., 9 Dec 2025, Zhang et al., 2023).

1. Evolutionary Transformer Architecture Search (ETAS)

DermETAS-SNA’s visual inference backbone is discovered via an Evolutionary Transformer Architecture Search (ETAS) executed on the SKINCON dataset. The search space consists of ViT parameterizations with variable transformer depth (n{6,7,,12}n \in \{6,7,\ldots,12\}), attention head count per layer (hi{8,16}h_i \in \{8,16\}), MLP dimension per layer (mi{2048,3072,4096}m_i \in \{2048, 3072, 4096\}), and dropout rate (di[0.1,0.3]d_i \in [0.1, 0.3]). Patch size (16×16), image resolution (224×224), and embedding dimension (768) are kept fixed.

Individuals in the population encode the ViT as tuples of layer parameters I=(L1,L2,...,Ln)I = (L_1, L_2, ..., L_n) with Li=(hi,mi,di)L_i = (h_i, m_i, d_i). Genetic operators include random initialization (population size 5), tournament parent selection, roulette-wheel selection with elitism, single-point crossover (pxoss=0.8p_{xoss} = 0.8), and mutation (add/remove/perturb a layer; rates [0.7,0.2,0.1][0.7, 0.2, 0.1] at pmut=0.2p_{mut} = 0.2). Fitness is the 5-fold averaged F1 score on SKINCON (3 886 images, 48 fine-grained concept labels):

Fitness(I)=15k=15F1score(k)(TrainAndEvaluate(I))\text{Fitness}(I) = \frac{1}{5} \sum_{k=1}^5 F1^{(k)}_\text{score}(\text{TrainAndEvaluate}(I))

After 20 generations, the optimal backbone was a 12-layer, 16-head ViT with MLP dimensions up to 4096 and dropout 0.1–0.2, frozen for use in downstream classifier modules. This search ensures dermatology-specific representation learning robust to complex lesion morphologies (Oruganty et al., 9 Dec 2025).

2. Disease Classification via Fine-Tuned Binary Classifiers

Leveraging the ETAS-ViT, DermETAS-SNA constructs a one-vs-all binary classifier ensemble on the DermNet dataset (19 500 images, 23 categories such as Melanoma, Psoriasis, Actinic Keratosis). For each disease class cc, a dataset DcD_c is balanced to 1:1 using under/oversampling, and the ViT backbone’s head replaced with a single-neuron sigmoid output. Two fine-tuning strategies are employed:

  • Full Unfreezing (FU): all weights trainable from initiation
  • Gradual Unfreezing (GU): only head is trainable at first, unfreezing earlier layers epoch by epoch

Training covers grids of learning rates ({1×105,5×105,1×104}\{1\times10^{-5}, 5\times10^{-5}, 1\times10^{-4}\}), batch sizes ({16,32}\{16, 32\}), and momentum, with up to 50 epochs and early stopping by validation F1. Binary cross-entropy is used, with Focal Loss at the meta-classification stage. Standard dermoscopy augmentations (flips, ±\pm15° rotations, color jitter) are applied. Class classifiers McM_c are thus specialized for both rare and common skin conditions, mitigating class imbalance (Oruganty et al., 9 Dec 2025).

3. StackNet Augmented Ensemble and Meta-Classification

At inference, the primary classification ensemble proceeds in two stages ("StackNet" Editor's term). In Level 1, the 23 ViT binary classifiers output P(x)=[p1(x),...,p23(x)]R23P(x) = [p_1(x), ..., p_{23}(x)] \in \mathbb{R}^{23}. In Level 2, a meta-classifier MmetaM_\text{meta} receives a concatenated feature vector F(x)F(x) that fuses:

  • P(x)P(x) (class probabilities)
  • Dmulti(x)D_\text{multi}(x): multi-scale features (2048-dim) via global average pooling of four intermediate layers from a ResNet-50 pretrained on ImageNet
  • S(x)S(x): summary statistics (mean, std, max, top-3 mean of P(x)P(x))

This yields F(x)R2075F(x) \in \mathbb{R}^{2075}. The meta-classifier is a 1D-CNN (three fully-connected layers: 1024→512→256) trained with Focal Loss. The predicted class is:

y^=argmaxcMmeta(F(x))c\hat{y} = \arg\max_{c} M_\text{meta}(F(x))_c

This StackNet paradigm allows robust aggregation of weak binary classifiers, explicit integration of multi-scale features, and statistical summaries to improve predictive accuracy and handle class imbalance (Oruganty et al., 9 Dec 2025).

4. RAG-Based Diagnostic Explanation and LLM Personalization

DermETAS-SNA incorporates a Diagnostic Explanation and Retrieval Model for Dermatology (DERM-RAG) utilizing a comprehensive, LLM-assisted pipeline:

  • Knowledge Base: Formed from 5 000 pages of dermatological textbooks, parsed into ~10 000 thematic chunks, embedded via Qwen2-1.5B-Instruct, stored with QdrantDB for vector retrieval.
  • Query Formulation: Detected features (e.g., papule, erythema) and predicted labels ground natural-language queries.
  • Passage Retrieval and Reranking: Qwen2-1.5B embedding locates top-K similar passages; Cohere rerank-v3.5 orders them by cross-encoder relevance.
  • LLM Prompting: Gemini 2.5 Pro receives a template prompt with up to 2 000 context tokens. System instructions clarify expert role; user prompts request plain-language explanations and treatment options. The LLM’s response is scored via beam-search likelihood, and low-confidence outputs (LL<threshold\text{LL} < \text{threshold}) are filtered or re-queried.

To further enhance personalization, a dual-process bionic memory inspired by neuroscience is implemented (Zhang et al., 2023). Working Memory (WM) stores per-turn dialogue notes, Short-Term Memory (STM) tracks recent patient-specific attributes (key–value pairs with learned embeddings), and Long-Term Memory (LTM) contains domain-wide dermatology knowledge (e.g., guidelines, ontologies):

  • STM retrieval uses 2\ell_2 distance for embedding similarity; entries are consolidated to LTM after repeated access.
  • LTM retrieval uses cosine similarity.
  • Only frequently accessed STM entries (θ\geq \theta times) are promoted to LTM, emulating consolidation in human memory.

Parameter-efficient fine-tuning (PEFT) is achieved with LoRA adapters and per-layer domain adapters, permitting adaptation with 1%\ll 1\% of model parameters. Training optimizes causal language modeling loss, an adapter consistency term to prevent catastrophic forgetting, and an 2\ell_2 penalty on LoRA updates.

5. Experimental Performance and Clinical Evaluation

On DermNet (23 classes), the system demonstrates significant performance gains versus the SkinGPT-4 baseline:

Metric DermETAS-SNA SkinGPT-4
Accuracy 59.89% 52.92%
Precision 59.25% 54.57%
Recall 55.29% 46.83%
Macro F1-Score 56.30% 48.51%
Matthews CC 0.57 0.50

Macro-F1 increases by 16.06 percentage points, with per-class F1 ranging from 44.52% (Psoriasis) to 74.16% (Melanoma). Experimental evaluation confirms robust superiority (see Table 2 in (Oruganty et al., 9 Dec 2025)).

RAG-based LLM outputs were evaluated by eight licensed dermatologists on seven representative conditions, with 92% overall expert agreement on six axes (accuracy, clarity, actionability, etc.), compared to 48.2% agreement for SkinGPT-4.

6. Proof-of-Concept Implementation and System Architecture

A real-world prototype demonstrates seamless integration of DermETAS-SNA. The front-end uses Streamlit; the back-end is deployed on an Apple Mac Studio (M2 Max, 38-core GPU). User workflow proceeds as:

  1. Dermoscopy image upload (JPEG/PNG)
  2. ETAS-ViT feature extraction and disease prediction via 23 binary classifiers + StackNet meta-classifier
  3. Extracted features and prediction seed a RAG query; Qdrant/Cohere retrieve and reorder knowledge base passages
  4. Gemini 2.5 Pro produces a concise, medically informed diagnostic description
  5. Interactive Q&A allows clarification and follow-up

Communication is REST-based, providing modularity, scalability, and updatable components (Oruganty et al., 9 Dec 2025).

7. Practical Guidelines and System Trade-Offs

Memory configuration for STM (≈50 entries, embedding dim ≈768, consolidation threshold of 3 accesses, pruning every 5 dialogue turns) balances personalization and latency. Increasing STM to 100 entries yields a 15% retrieval latency increase. LoRA rank (r=8r_\ell=8) is preferred for efficiency; r=16r_\ell=16 offers slightly better fluency with doubled parameter cost. Domain adapter rank (rd=64r_d=64 vs. 32) halves adapter training time at a minor quality drop (~1 ROUGE point).

Evaluation benchmarks emphasize accuracy (ROUGE-L), preference adaptation, guideline adherence (safety), user win-rate (A/B human judgment), and system throughput (≤1 s response, ≥10 req/s on a 32 GB GPU) (Zhang et al., 2023).


DermETAS-SNA LLM Assistant combines optimized visual representations, class-imbalanced ensemble learning, retrieval-augmented explanation, and personalized LLM interaction using computationally efficient, modular architectures. It achieves substantial empirical gains in both diagnostic accuracy and clinical trustworthiness, establishing a pathway for next-generation, interpretable AI in dermatological medicine (Oruganty et al., 9 Dec 2025, Zhang et al., 2023).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to DermETAS-SNA LLM Assistant.