- The paper introduces a novel angle embedding with a Graph Convolutional Network to identify atypical gait and gesture patterns in autism.
- It uses 3D skeleton data and Skepxels to achieve robust classification and precise ADOS score regression.
- The study reveals that asymmetry and elevated mean joint angles are critical behavioral indicators for autism diagnosis.
This paper, "Human Gesture and Gait Analysis for Autism Detection" (Human Gesture and Gait Analysis for Autism Detection, 2023), presents a video-based method for detecting Autism Spectrum Disorder (ASD) and quantifying its severity using analysis of human gesture and gait patterns from skeleton data. The authors highlight that while previous research often focused on facial and eye-gaze features for autism diagnosis, movement and gesture offer a complementary, non-intrusive avenue, particularly useful for nonverbal individuals. They note that atypical gait and gesture patterns are characteristic behavioral traits of autism.
The paper addresses two main objectives: classifying children with autism and regressing Autism Diagnostic Observation Schedule (ADOS) scores to quantify severity. The core hypothesis is that ASD is distinguishable primarily from gesture patterns.
The key contributions of the paper are:
- Proposing a novel angular feature matrix embedded into input skeleton data to help a Graph Convolutional Network (GCN) detect the peculiar slant in the gait posture of ASD children.
- Automatically predicting ADOS scores that show high correlation with scores measured by human experts.
- Conducting a detailed statistical analysis of gait posture in ASD and typically developing (TD) children, investigating asymmetry.
The proposed methodology leverages skeleton-based representation and graph convolution. The input consists of 3D joint coordinates representing the human skeleton over time. This data is first normalized over the frame dimension.
A novel Angle Embedding technique is introduced to enhance feature representation. It calculates the angle between each joint and all other joints, creating a 25×25 feature matrix. This angle matrix is then multiplied with the normalized input skeleton stream. The intuition is that atypical joint angles, particularly the slanted posture and asymmetry observed in ASD children's gait, are highlighted by this embedding. The cosine angle Xi,jθ between normalized joints Xˉi and Xˉj is computed using their dot product: Xˉi⋅Xˉj.
The embedded skeleton data is then processed by a Graph Convolutional Network (GCN) encoder. The paper specifically uses MSG3D [liu2020disentangling], a GCN model designed for skeleton-based action recognition, which incorporates both spatial and temporal convolutions across multiple scales to capture local and global patterns.
During the training phase, the model employs a two-stream co-learning framework using Skeleton Picture Elements (Skepxels). Skepxels are an image representation of skeleton frames [Liu_2019_CVPR_Workshops], designed to capture spatio-temporal correlations. A Vision Transformer (ViT) [dosovitskiy2021an] is used to encode the Skepxel images into an aggregated feature embedding. An MLP layer maps the Skepxel embedding to the same feature space as the GCN output. A Euclidean distance loss between the GCN embedding and the Skepxel embedding is added to the main task loss during training. This allows the model to learn from the Skepxel representation without needing it during inference.
For classification, the output of the GCN layers is passed through a fully connected (FC) layer to predict the class (ASD or TD).
For ADOS score regression, clip-based features are extracted from the GCN output and concatenated. Support Vector Regression (SVR) is then applied to these concatenated features to predict the final ADOS score for a video.
The method was evaluated on two datasets:
- Gait and Full Body Movement dataset [Ahmed_2021]: Contains 3D skeleton and RGB videos of 109 children (59 ASD, 50 TD), focusing on gait cycles. It includes original and augmented samples. The authors used 10-fold cross-validation with two subject selection strategies: random shuffling and block-based split (ensuring subjects stay within the same split).
- DREAM dataset [billing_2020]: Contains upper-body skeletons and ADOS scores for 61 children (all ASD, ages 3-6) interacting with therapists/robots in different tasks (imitation, joint attention, turn-taking). Sessions vary in length.
Data Preprocessing involved converting the upper-body skeletons in the DREAM dataset to a full-body structure by interpolating missing lower body joints to match the 25-joint format expected by the GCN. Eye-gaze vectors were incorporated as an additional head joint. Data was transformed to be view-invariant (shoulders aligned with x-axis, spine with z-axis, spine at origin) for DREAM, but not for the Gait dataset to preserve the characteristic slanted gait. Frames were repeated to maintain fixed video length.
The Statistical Analysis section provides evidence supporting the use of gait/gesture features. It found that ASD children exhibit a higher distribution of mean joint angles (relative to the spine) and show less dispersed motion distribution compared to TD children. Furthermore, analysis of angle, motion, and distance between left and right side joints revealed significantly higher asymmetry in ASD samples, supporting the hypothesis that gait asymmetry is a distinctive feature.
Quantitative Results demonstrate the effectiveness of the proposed method.
On the Gait and Full Body Movement dataset, the proposed method (MSG3D + Angle infusion + Skepxel distance loss) achieved an average accuracy of 90.86% for block data and 91.48% for random data across 10-fold cross-validation, outperforming the baseline MSG3D (86.38% and 91.24%). Evaluating only on original (non-augmented) test samples, the accuracy reached 93.00% (block) and 92.33% (random). This is comparable or slightly better than the previously reported 92.00% on this dataset.
On the DREAM dataset for ADOS score regression, the method achieved an average error rate of 2.91 ± 0.27. The predicted ADOS scores showed a Spearman correlation of 0.34 ± 0.07 with expert-measured scores (with a mean P-value of 0.002). Classification accuracy based on predicted scores averaged 51.56% ± 5.10. Analysis by task showed difficulty in classifying samples with low ADOS scores (7-10, Non-Spectrum class) but better performance for higher scores (ASD and AUT classes), which exhibit more prominent behavioral anomalies.
In the Discussion, the authors acknowledge limitations such as the short duration of samples in the Gait dataset and challenges with the DREAM dataset (only ASD subjects, potentially non-standardized ADOS measurement, difficulty with mild cases). However, they emphasize the potential of skeleton-based assessment as a non-intrusive and privacy-preserving alternative for automating diagnosis and severity prediction, particularly beneficial for children with severe autism who show clearer atypical physical behaviors.
The Conclusion summarizes the findings: ASD children show asymmetrical gait and higher mean joint angles. The proposed angle embedding and Skepxels enhance feature extraction for GCNs, enabling effective autism classification and ADOS score regression from skeleton videos. The work suggests skeleton data is a promising direction for autism research.