Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder (2408.13255v1)

Published 23 Aug 2024 in cs.CV and cs.AI

Abstract: Early detection of autism, a neurodevelopmental disorder marked by social communication challenges, is crucial for timely intervention. Recent advancements have utilized naturalistic home videos captured via the mobile application GuessWhat. Through interactive games played between children and their guardians, GuessWhat has amassed over 3,000 structured videos from 382 children, both diagnosed with and without Autism Spectrum Disorder (ASD). This collection provides a robust dataset for training computer vision models to detect ASD-related phenotypic markers, including variations in emotional expression, eye contact, and head movements. We have developed a protocol to curate high-quality videos from this dataset, forming a comprehensive training set. Utilizing this set, we trained individual LSTM-based models using eye gaze, head positions, and facial landmarks as input features, achieving test AUCs of 86%, 67%, and 78%, respectively. To boost diagnostic accuracy, we applied late fusion techniques to create ensemble models, improving the overall AUC to 90%. This approach also yielded more equitable results across different genders and age groups. Our methodology offers a significant step forward in the early detection of ASD by potentially reducing the reliance on subjective assessments and making early identification more accessibly and equitable.

Summary

The paper demonstrates an ensemble modeling approach combining LSTM outputs from eye gaze, head, and facial features to achieve a 90% AUC.
Advanced data curation and late fusion techniques significantly outperformed single-modality models, boosting diagnostic sensitivity and specificity.
The study addresses fairness across gender and age, proposing a scalable framework for early, accessible ASD intervention.

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder

The paper, "Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder," presents a sophisticated integration of machine learning techniques to enhance the early diagnosis of Autism Spectrum Disorder (ASD) using naturalistic home videos captured by the mobile application 'Guess What'. The paper leverages a unique dataset consisting of over 3,000 structured videos featuring 382 children, including those diagnosed with and without ASD, collected during gameplay between children and their guardians.

Dataset and Methodology

The dataset, derived from home environments, offers a rich source of information by capturing authentic behaviors. A key contribution of this research is the rigorous processing methodology used to filter and curate videos. Advanced feature extraction methods were applied, focusing on eye gaze, head position, and facial landmarks, which are crucial phenotypic markers associated with ASD. These features are subsequently used as input for Long Short-Term Memory (LSTM) based models.

The research employment of late fusion techniques to combine the output of models trained on individual modalities is notable. Such fusion approaches, including both late averaging and linear models, culminate in an Area Under the Curve (AUC) of 90%, thereby significantly enhancing diagnostic accuracy over individual models. The LSTM models, pre-trained and fine-tuned on high-quality video data, initially achieved AUCs of 86% for eye gaze, 67% for head positions, and 78% for facial landmarks.

Comparative Analysis

The paper evaluates these ensemble models against traditional single-modality models, examining their diagnostic power through metrics such as the Receiver Operating Characteristic (ROC) curves. Late fusion models are reported to outperform single-modality ones significantly, demonstrating increased sensitivity and specificity. Specifically, the late fusion averaging model demonstrated a test AUC of 0.90, while individual eye gaze models achieved a maximum of 0.86.

The researchers also highlighted fairness in their models by assessing gender and age demographic parity. The late fusion models showed improved parity and consistency across these demographic factors, addressing concerns over bias and enhancing the model's applicability across diverse populations.

Implications and Future Directions

The results indicate substantial practical and theoretical implications. The approach not only supports more timely and equitable ASD diagnoses but also suggests a scalable diagnostic framework outside traditional clinical settings. This reduces reliance on subjective in-person evaluations, making advanced diagnostic tools accessible to a broader audience through mobile applications.

Despite these advancements, the paper acknowledges several limitations and directions for future research. Addressing data drifts through accelerometer data integration, improving the automated pipeline for video preprocessing, and expanding modalities, including speech, are identified as critical areas for development. Additionally, collecting skin color data and generalizing models over diverse skin tones remain vital to promote equity and inclusivity in diagnosis.

In conclusion, this paper demonstrates a robust framework for utilizing ensemble modeling techniques in AI to address complex diagnostic tasks, marking a significant step forward in utilizing digital phenotyping for neurodevelopmental disorders. Such innovations promise broader applicability across various conditions, enhancing early intervention strategies and improving health outcomes on a significant scale.

PDF Markdown

Related Papers

YouTube

Show All Videos