Dermatology Assessment Schema (DAS)
- Dermatology Assessment Schema (DAS) is a structured framework encoding key dermatological features from patient images and queries.
- It standardizes feature annotation and supports both closed-ended QA and lesion segmentation using bilingual (English/Chinese) representations.
- The schema enables reproducible multimodal AI benchmarks by leveraging clinician-driven ontology and consensus-based evaluation protocols.
The Dermatology Assessment Schema (DAS) is a clinician-designed, structured framework that encodes key dermatological features extracted from patient-generated images and patient-authored queries. DAS provides a standardized, multilingual (English/Chinese) basis for both closed-ended multiple-choice question answering (QA) and dermatological lesion segmentation, enabling reproducible evaluation and benchmarking of multimodal models for patient-centered dermatology. It is implemented in the DermaVQA-DAS dataset, which supports joint QA and segmentation benchmarks, and underpins the structured evaluation of model-generated diagnostic narratives in multimodal settings (Yim et al., 30 Dec 2025, Shen et al., 12 Nov 2025).
1. Schema Definition and Objectives
DAS is explicitly designed to address the lack of clinically meaningful, patient-contextual benchmarks in dermatological AI. Its principal objectives are:
- Standardization of salient dermatological features—such as anatomic location, lesion size, morphology, color, border regularity, and surface features—using a clinician-driven ontology.
- Support for both closed-ended QA (systematic, clinician-authored questions) and lesion segmentation for comprehensive patient-to-clinician and model-to-clinician alignment.
- Specification of machine-readable, bilingual representations suitable for large multimodal model (MLLM) fine-tuning, evaluation, and deployment in patient-centered workflows.
- Facilitation of reproducible model assessment protocols by defining explicit aggregation, voting, and scoring methods for those tasks.
2. Hierarchical Structure and Content
The schema consists of two hierarchical tiers: high-level clinical assessment questions (categories), each potentially associated with several fine-grained subquestions that map to real-world annotation and clinical workflows.
High-Level DAS Questions (36 total):
Nine categories are most populated and serve as canonical axes for annotation, including:
- CQID010: Onset of lesion
- CQID011: Anatomic location (slot 1)
- CQID012: Size at location (slot 1)
- CQID015: Distribution pattern
- CQID020: Surface characteristics
- CQID025: Primary morphology
- CQID034: Color
- CQID035: Border regularity
- CQID036: Secondary features
Fine-Grained Subquestions (27 fields):
Each high-level question can have subordinate slots or subfields, such as location multiplicity (e.g., locations 1–3), or site-specific attributes (e.g., color at location 2). This structure encodes standard dermatologic notation where multiple lesions or sites are described systematically (Yim et al., 30 Dec 2025).
3. Schema Representation and Implementation
DAS is publicly released in JSON format, with each question represented by:
- A unique identifier (e.g., "CQID025-001")
- Bilingual question text (English/Chinese)
- Enumerated, coded multiple-choice answers (integer code + both language labels)
Example Entry:
1 2 3 4 5 6 7 8 9 10 11 |
{
"question_id": "CQID025-001",
"question_en": "What is the primary morphology of the lesion?",
"question_zh": "病变的主要形态是什么?",
"choices": [
{"code": 1, "en": "Macule", "zh": "斑点"},
{"code": 2, "en": "Papule", "zh": "丘疹"},
...
{"code": 8, "en": "Not mentioned", "zh": "未提及"}
]
} |
4. Development and Validation Methodology
DAS schema development was led by two board-certified dermatologists and refined in iterative clinician workshops to ensure coverage and clarity of both domains and language. For the DermaVQA-DAS dataset:
- Each closed-ended QA field was independently annotated by three medical annotators, with majority vote as the gold standard.
- For segmentation, four annotators produced three masks per image; reference masks were created by pixel-wise majority voting (Yim et al., 30 Dec 2025).
- Annotation workflows encode robust, consensus-driven gold standards aligning with best clinical practice.
The schema's six-dimensional adaptation is used in DermEval, a reference-free multimodal evaluator aligned with DermBench for automatic scoring of model-generated clinical narratives (Shen et al., 12 Nov 2025).
5. Example Questions and Choices
DAS items explicitly encode coarse and fine lesion properties. Key examples include:
| Question ID | Clinical Aspect | Example Choices (English label) |
|---|---|---|
| CQID011-001 | Anatomic Location (Location 1) | 1: Head/Neck, 2: Trunk, 3: Upper extremities, ... |
| CQID012-001 | Size at Location 1 | 1: <1 cm, 2: 1–3 cm, 3: >3 cm, 4: Not mentioned |
| CQID034-001 | Color | 1: Red, 2: Brown, 3: Blue, 4: Black, ... |
The comprehensive bilingual encoding enables precise mapping of patient descriptions to standardized machine-readable entities.
6. Mathematical and Scoring Framework
Segmentation Evaluation
Segmentation is evaluated using two standard spatial overlap metrics:
- Jaccard Index (IoU):
- Dice Coefficient (F1):
where is the predicted mask and the gold-standard mask. Aggregation strategies include mean-of-max, mean-of-mean, and pixel-wise majority-vote microscores.
Narrative Assessment Dimensions
When DAS is instantiated for narrative evaluation (DermEval), six clinically-grounded scoring axes are defined, each scored 1–5, with dimension-specific rubrics:
- Accuracy: Match to gold-standard diagnosis and features.
- Safety: Absence of unsafe recommendations.
- Medical Groundedness: Consistency with dermatologic science.
- Clinical Coverage: Completeness of assessed features and management advice.
- Reasoning Coherence: Quality of differential and logic.
- Description Precision: Specificity of descriptors.
For case :
with as the per-dimension score. Model calibration is tracked via mean deviation against physician scores:
7. Application Protocols and Recommendations
Recommended protocols for researchers adopting DAS include:
- Vision–LLM Training: DAS’s structured, bilingual questions enable fine-tuning and robust prompting of multimodal LLMs for closed-ended reasoning and segmentation. For multi-site inference, aggregation is performed by union of (location, size) combinations or descriptive labels, with explicit rules for color combinations (Yim et al., 30 Dec 2025).
- Dataset Construction: To encourage multimodal joint learning, jointly release both QA and segmentation data; generate gold standards by multi-annotator majority voting.
- Evaluation: For segmentation, report mean-of-max, mean-of-mean, and pixel-wise majority-vote scores for Jaccard and Dice. For QA, enable partial credit on multi-site questions and apply exact-match accuracy on answer code sets. Provide code for reproducible thresholding and evaluation.
In narrative evaluation, each dimension is scored independently, with justifications stored per dimension and an overall mean score computed. System architectures should adopt dual-encoder fusion and classifier heads per dimension to ensure calibrated, explainable evaluation (Shen et al., 12 Nov 2025).
This suggests that by following DAS and its explicit scoring and representation protocols, research groups can construct, evaluate, and compare patient-centered dermatology vision–language systems in a transparent, clinically meaningful, and multilingual manner, with direct support for both segmentation and diagnostic narrative generation tasks.