Toronto NeuroFace Dataset Overview

Updated 18 January 2026

Toronto NeuroFace dataset is a clinically annotated collection of facial video data from ALS, stroke patients, and healthy controls, defining its scope for neurological research.
It enables quantitative, non-invasive assessment of orofacial motor dysfunction by capturing standardized tasks such as repetitive phonation and static postures.
The dataset employs precise manual annotation of 68 facial landmarks and rigorous preprocessing, supporting deep-learning models for dysarthria and motor impairment analysis.

The Toronto NeuroFace dataset is a publicly available, clinically acquired resource of annotated facial video data from patients with neurological disorders—primarily amyotrophic lateral sclerosis (ALS) and stroke—alongside healthy controls. It is the first open-access, video-based collection of facial landmark annotations sourced from neurological patients with the explicit aim of supporting deep-learning research for quantitative, non-invasive assessment of orofacial motor dysfunction, particularly in relation to dysarthria and related disorders (Migliorelli et al., 2023, Gomes et al., 2023).

1. Dataset Composition and Demographic Overview

The Toronto NeuroFace dataset comprises 36 individuals, distributed across three groups:

Cohort	Number of Subjects	Gender (M/F)
ALS	11	4 / 7
Stroke	14	10 / 4
Healthy Control	11	7 / 4

Age: Explicit ages are undisclosed; healthy participants were selected to match the age profile of pathological groups.
Totally annotated images: 3,306 frames; partitioned as 1,015 healthy, 920 ALS, and 1,371 stroke.
Overall gender distribution: 21 males, 15 females.

This cohort composition reflects an emphasis on matching control and disease demographics, but sample sizes, especially for ALS and control cases, are modest (Migliorelli et al., 2023).

2. Acquisition Protocols and Experimental Tasks

Recording device: Intel® RealSense RGB camera, face distance set at 30–60 cm.
Sampling: 30 frames per second; image resolution of 640×480, recorded under uniform clinical laboratory lighting with minimal background noise.
Orofacial tasks: Standardized, clinically relevant tasks include:
- Static facial postures (maximal mouth opening, lip protrusion, lip stretching)
- Repetitive phonation (diadochokinetic sequence “pa-ta-ka”, rapid repetition of /pa/ syllable)
- Additional tasks in ALS work: “kiss” (lip puckering), “blow” (imitating blowing a candle), lip spread, natural rest, and the “BBP” sentence (“Buy Bobby a puppy”) (Gomes et al., 2023).
Environment: Data collected in a controlled clinical laboratory to optimize lighting and minimize confounds such as variable head pose or occlusions.

Each video session was manually segmented into repetition-level clips according to the task structure, particularly for tasks requiring multiple iterations such as rapid syllable repetition and sentence articulation (Gomes et al., 2023).

3. Annotation Standards and Metadata

Landmark annotation: 68 two-dimensional facial landmarks per frame, following the dlib-style convention referenced in the literature:
- 17 jawline, 10 eyebrow, 9 nose, 12 eye, and 20 lip/mouth landmarks.
Bounding box: Manually annotated, tightly enclosing the face in each image.
Annotation method: Fully manual, frame-by-frame expert annotation for all 3,306 facial frames (Migliorelli et al., 2023).
Metadata per frame/image: Includes subject identifier, categorical pathology label (ALS, stroke, or control), and task label.
Coordinate system: Ground-truth (x, y) pixel locations on original RGB frames; depth or 3D data are not provided.
Task breakdown—ALS identification subset: 921 patient/session clips comprising the following tasks and instance counts:

Task	ALS Repetitions	Control Repetitions
SPREAD	55	59
KISS	59	57
OPEN	54	55
BLOW	31	39
BBP	95	111
PA	100	110
PATAKA	88	108

4. Preprocessing, Data Splits, and Model Input Conventions

Frame selection: For cyclic or repetitive movements, three frames per repetition are sampled (onset, peak, and midpoint) to maximize intra-individual kinematic diversity (Migliorelli et al., 2023).
Data split: 32 subjects (cohort-balanced) are assigned to the train/validation set, and 4 gender-balanced individuals (2 ALS, 2 stroke) form an independent test set.
Augmentation in neural network training: Random horizontal flipping (probability 0.5) and random adjustment of brightness scaling factor (sampled uniformly from [0.8, 1.2]). No upstream alignment or Procrustes normalization; region proposals and alignment performed dynamically during inference.
ALS/control identification preprocessing: OpenFace 2.0 is used for automatic face detection, head pose estimation, and 200×200 pixel face cropping (grayscale conversion). Landmark extraction is performed with FAN (Facial Alignment Network), after which 26 anatomically relevant landmarks (lips and jawline) are retained for graph construction (Gomes et al., 2023).

5. Evaluation Metrics and Statistical Performance

Primary metric: Normalized Mean Error (NME), quantifying average landmark localization error as a percentage of the bounding-box diagonal; computed as:

$\mathrm{NME} = \frac{1}{N_{\text{imgs}}} \sum_{k=1}^{N_{\text{imgs}}} \left[ \frac{1}{N_\ell} \sum_{i=1}^{N_\ell} \frac{\sqrt{(x_i^k - \hat{x}_i^k)^2 + (y_i^k - \hat{y}_i^k)^2}}{\mathrm{diag}(B^k)} \right] \times 100\%$

where $(x_i^k, y_i^k)$ and $(\hat x_i^k, \hat y_i^k)$ are the true and predicted landmark i coordinates for image k, and $\mathrm{diag}(B^k)$ is the annotated bounding box diagonal; $N_\ell=68$ .

Reported landmark localization results: Best model (“facial-landmark Mask RCNN”):

Region	NME (%)
All (68)	1.79
Jaw (17)	2.62
Eyebrows	0.02
Nose	1.55
Eyes	1.03
Mouth	1.49

Ablation studies (without pretraining or with vanilla Mask RCNN backbone) exhibited higher NMEs (2.7–13.6%). Variance and statistical significance estimates are not provided (Migliorelli et al., 2023).

ALS identification protocol: The LOSO framework holds out one subject for testing, with majority-vote decision over all per-repetition frame predictions, and separate splits for per-clip and per-subject evaluation (Gomes et al., 2023).

6. Applications in Automated Dysarthria and ALS Analysis

The Toronto NeuroFace dataset is foundational for:

Benchmarking facial landmark detection models in neurological cohorts, supporting development of CNN (Mask RCNN-based) systems for telemonitoring of dysarthria (Migliorelli et al., 2023).
Enabling research into computational phenotyping of ALS: Geometry-based “Facial Point Graph” approaches leverage the landmark data to construct Delaunay- and hub-augmented graphs processed by Graph Attention Networks, distinguishing ALS from healthy control based solely on facial motion during clinical tasks (Gomes et al., 2023).

Applied pipelines extract facial action information—in particular, lip and jaw kinematics—shown to be affected in bulbar-onset ALS, and facilitate both frame-based and subject-wise discrimination. Each frame typically passes through landmark extraction, selection, graph embedding, GAT-based processing, and final majority-vote classification schema.

7. Limitations, Biases, and Prospective Extensions

Sample size: The number of ALS and control subjects remains limited, constraining statistical power and generalizability.
Acquisition constraints: All data originate from a single device in a homogeneous, well-lit clinical setting; as such, real-world domain shifts including lighting variability, occlusions, and pose diversity are not represented.
Task repertoire: Only a predefined set of speech and non-speech gestures is included; other orofacial behaviors are absent.
Age/exclusion: Pediatric subjects and elderly individuals (>80 years) are not included; demographic scope is thus truncated.
Manual annotation: While presumed accurate, manual labeling is labor-intensive and exemption from inter-rater variability is not proved.
Metadata gaps: Disease-specific variables such as ALS onset type, duration, and severity gradients are not included in the annotated release (Migliorelli et al., 2023, Gomes et al., 2023).

A plausible implication is that expansion to broader populations, more variable environmental conditions, and incorporation of diverse task types will be necessary to support robust, home-based, and longitudinal neurofunctional assessment.

Markdown Report Issue Upgrade to Chat

References (2)

A store-and-forward cloud-based telemonitoring system for automatic assessing dysarthria evolution in neurological diseases from video-recording analysis (2023)

Facial Point Graphs for Amyotrophic Lateral Sclerosis Identification (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Toronto NeuroFace Dataset.

Toronto NeuroFace Dataset Overview

1. Dataset Composition and Demographic Overview

2. Acquisition Protocols and Experimental Tasks

3. Annotation Standards and Metadata

4. Preprocessing, Data Splits, and Model Input Conventions

5. Evaluation Metrics and Statistical Performance

6. Applications in Automated Dysarthria and ALS Analysis

7. Limitations, Biases, and Prospective Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Toronto NeuroFace Dataset Overview

1. Dataset Composition and Demographic Overview

2. Acquisition Protocols and Experimental Tasks

3. Annotation Standards and Metadata

4. Preprocessing, Data Splits, and Model Input Conventions

5. Evaluation Metrics and Statistical Performance

6. Applications in Automated Dysarthria and ALS Analysis

7. Limitations, Biases, and Prospective Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research