Fitzpatrick Skin Types

Updated 28 June 2026

Fitzpatrick Skin Types are a six-category system describing skin's UV response and pigmentation with defined clinical characteristics.
Automated methods, including ITA-based approaches and neural networks, enhance objective and scalable skin type estimation.
Assessments face challenges like lighting variability, biased datasets, and limited granularity in darker tones.

The Fitzpatrick skin type (FST, FP, or "Fitzpatrick scale") system is a six-category classification developed to describe human skin's response to ultraviolet (UV) radiation and its baseline pigmentation. Widely adopted in dermatology, medical device calibration, and, more recently, AI fairness auditing in computer vision, the Fitzpatrick scale remains a central—though increasingly scrutinized—taxonomy of skin phototypes. The scale’s conceptual foundation, assignment protocols, objective correlation to quantitative melanin metrics, and role in data-driven modeling are the subject of active technical investigation and debate.

1. Definition and Clinical Assignment of Fitzpatrick Skin Types

The Fitzpatrick scale classifies skin into six distinct phototypes, primarily reflecting UV response phenotypes and observed baseline color:

Type	UV Reaction	Typical Skin Appearance
I	Always burns, never tans	Very fair/pale, often red/blond hair, freckles
II	Usually burns, tans minimally	Fair
III	Sometimes mild burn, tans uniformly/light brown	Cream white to light beige
IV	Rarely burns, tans well/moderate brown	Light brown/olive
V	Very rarely burns, tans very easily/dark brown	Brown skin (Middle Eastern, Hispanic, light African–American ancestry)
VI	Never burns, deeply pigmented (very dark brown/black)	Deep dark brown to black skin

Clinical assignment of FST is most robustly performed by in-person dermatologist evaluation or validated self-report instruments. The canonical in-person protocol comprises either a provider's subjective judgment, possibly supplemented by questioning about personal UV response history (burn/tan behavior), or a brief structured self-report (e.g., Eilers et al. 2013 single-question tool) (Howard et al., 2021). These categories are behavioral rather than spectrophotometric and rely on observing or recalling erythema and tanning responses, which are affected by current sun exposure, not just constitutive skin tone (Howard et al., 2021).

2. Automation and Objective Estimation: From Images to ITA and Fitzpatrick Types

Manual FST assignments are characterized by substantial rater disagreement, with three-way agreement in controlled face image studies as low as 31–36% (exact) and 62% two-of-three agreement, even when reference exemplars are supplied (Krishnapriya et al., 2021). The subjectivity and lighting sensitivity motivate attempts to automate FST assignment from digital images, using quantitative colorimetry.

A commonly adopted objective proxy is the Individual Typology Angle (ITA), defined in CIE-L*a*b* color space as

$\mathrm{ITA}(L^*, b^*) = \arctan\!\left( \frac{L^* - 50}{b^*} \right) \times \frac{180}{\pi}$

with $L^*$ the perceptual lightness (0–100; higher is lighter) and $b^*$ the blue–yellow coordinate (Benčević et al., 6 Apr 2025, Benčević et al., 10 Feb 2026). ITA is a continuous measure, enabling mapping to Fitzpatrick bins using published thresholds (e.g., FP I: ITA > 55°, FP II: 41–55°, FP III: 28–41°, FP IV: 10–28°, FP V: –30–10°, FP VI: ≤–30°) (Krishnapriya et al., 2021, Benčević et al., 6 Apr 2025, Benčević et al., 10 Feb 2026).

Segmentation-based and color quantization methods provide robust ITA/FST estimates under varied illumination if carefully filtered for healthy skin, operating in CIE-L*a*b* space and utilizing relative statistics (medians or cluster centroids) (Benčević et al., 6 Apr 2025). Deep neural networks with ordinal regression heads can further outperform heuristic colorimetric baselines, provided sufficient annotated data (Benčević et al., 10 Feb 2026). However, generalization to in-the-wild and lesional images remains limited by distribution shift and dataset composition.

3. Performance, Agreement, and Bias in Manual and Automated Assignments

Manual FST ratings from images are susceptible to lighting, color balance, and incomplete skin views, yielding moderate reproducibility (exact agreement 25–59% by type, off-by-one 71–85%) (Groh et al., 2021). Automated ITA-based methods, when images are color-corrected, match manual consensus at ±1 type in 96% of cases for controlled mugshot data (Krishnapriya et al., 2021). In clinical datasets (Fitzpatrick17k), Cohen's κ between human and objective ITA-based FST assignments reaches 0.52 (moderate), with superior concordance on darker skin types (V–VI) and poorer on intermediate tones (Groh et al., 2021).

Neural networks trained for ordinal regression on dermatologist-assigned FST achieve within-one-type accuracy of 84.8% and weighted κ ≈ 0.53, comparable to aggregated human crowd performance (Benčević et al., 10 Feb 2026). Recent colorimeter-supervised models achieve near-zero mean bias (Bland–Altman limits ±10 ITA) and intraclass correlation versus device measurements of 93.88% (Benčević et al., 10 Feb 2026).

4. Dataset Imbalance, Label Granularity, and Fairness Implications

Major public datasets for dermatology and clinical skin analysis are severely imbalanced with respect to FST representation. Fitzpatrick17k contains approximately 4.4–10.1% of images labeled as types V–VI, with lighter types (I–III) vastly overrepresented (Groh et al., 2021, Aayushman et al., 2024). Large public dermatoscopy image sets (ISIC 2020, MILK10k) contain <1% type V–VI, as revealed by automated colorimeter-supervised annotation (Benčević et al., 10 Feb 2026).

The FST scale itself has been criticized for coarse categorization of darker tones (types V–VI), leading to limited granularity and modeling bias (Shah et al., 14 Sep 2025). Experimental studies show that stratifying AI model training by FST groups (1/2, 3/4, 5/6) improves both accuracy (AUC up to 0.93 for 5/6) and fairness gap (e.g., calibration error gap reduced from 0.08 to 0.03) compared to FST-balanced training (Shah et al., 14 Sep 2025). However, further coarsening (e.g., merging 1–4 as a single group) diminishes performance and fairness among lighter types (Shah et al., 14 Sep 2025).

5. Methodological Considerations: Measurement, Colorimetry, and Calibration

Robust estimation of FST or skin tone from images requires control of camera, lighting, and environment. Uncontrolled acquisition yields within-subject lightness variation exceeding between-group differences (e.g., average intra-individual L* range 38 vs. inter-race mean difference 13) (Howard et al., 2021). Direct ground-truthing with colorimeter or spectrophotometer under standardized illumination provides the only reliable reference (Howard et al., 2021, Benčević et al., 10 Feb 2026). When images are used, inclusion of an 18% gray reference card and device/background normalization can raise image-based and device-based L* correlation to 0.92 (Howard et al., 2021).

For face and skin segmentation, the state of the art employs DeepLab V3 or BiSeNet for pixel selection, with further exclusion of non-skin regions by chroma thresholding (e.g., Cr 136–173, Cb 77–127 in YCbCr). These steps minimize the impact of artifacts, shadows, and background (Krishnapriya et al., 2021, Benčević et al., 6 Apr 2025, Benčević et al., 10 Feb 2026).

6. Practical Recommendations and Alternative Scales

For clinical and fairness applications, in-person or validated self-report FST remains the recommended protocol, but must be contextualized as a UV-reactivity scale with documented limitations as a proxy for skin reflectance (Howard et al., 2021).
Automated ITA-based estimation, if lighting is controlled and segmentation is thorough, enables scalable, reproducible FST assignments with comparable ±1-type tolerance to human raters (Krishnapriya et al., 2021, Benčević et al., 6 Apr 2025, Benčević et al., 10 Feb 2026).
Stratification by FST enhances model performance and reduces fairness gaps, but the scale’s granularity should be scrutinized; merging bins reduces model equity (Shah et al., 14 Sep 2025).
There is growing advocacy for replacing or supplementing FST with objective, continuous measures (e.g., ITA, face-area lightness, Monk Skin Tone Scale) in algorithmic fairness research to better capture the full range of human skin color phenotypes and support robust bias analysis (Shah et al., 14 Sep 2025, Benčević et al., 10 Feb 2026).

7. Limitations, Controversies, and Ongoing Directions

The Fitzpatrick scale conflates latent biological pigmentation and tanning propensity, and does not provide equal resolution across the skin color spectrum. Its assignment is fundamentally subjective and responsive to recent sun exposure. As an image annotation tool, it is unreliable except in highly controlled settings; even then, its correlation with direct reflectance measurements is only moderate (Kendall's τ = 0.51 overall, dropping to τ ≈ 0.23 within racial groups) (Howard et al., 2021). Contemporary research increasingly recommends objective, colorimeter-derived, or ITA-based approaches for phenotyping in both AI fairness and medical contexts (Benčević et al., 6 Apr 2025, Benčević et al., 10 Feb 2026, Shah et al., 14 Sep 2025).

Future work will likely focus on dataset diversification, higher-fidelity annotation protocols, direct measurement of pigmentary phenotype, and statistical frameworks for continuous skin color representation beyond ordinal FST grouping.