From Word Embedding to Reading Embedding Using Large Language Model, EEG and Eye-tracking (2401.15681v1)

Published 28 Jan 2024 in cs.HC

Abstract: Reading comprehension, a fundamental cognitive ability essential for knowledge acquisition, is a complex skill, with a notable number of learners lacking proficiency in this domain. This study introduces innovative tasks for Brain-Computer Interface (BCI), predicting the relevance of words or tokens read by individuals to the target inference words. We use state-of-the-art LLMs to guide a new reading embedding representation in training. This representation, integrating EEG and eye-tracking biomarkers through an attention-based transformer encoder, achieved a mean 5-fold cross-validation accuracy of 68.7% across nine subjects using a balanced sample, with the highest single-subject accuracy reaching 71.2%. This study pioneers the integration of LLMs, EEG, and eye-tracking for predicting human reading comprehension at the word level. We fine-tune the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for word embedding, devoid of information about the reading tasks. Despite this absence of task-specific details, the model effortlessly attains an accuracy of 92.7%, thereby validating our findings from LLMs. This work represents a preliminary step toward developing tools to assist reading.

Authors (4)

Yuhong Zhang (27 papers)
Shilai Yang (1 paper)
Gert Cauwenberghs (25 papers)
Tzyy-Ping Jung (23 papers)

Citations (3)

View on Semantic Scholar

Summary

This paper (Zhang et al., 28 Jan 2024 ) introduces a novel approach to understand human reading by integrating word embeddings from LLMs like BERT with neurophysiological data from Electroencephalography (EEG) and behavioral data from eye-tracking. The goal is to create a "reading embedding" that captures not just the semantic meaning of words but also the cognitive processes involved when a person reads them. This work aims to serve as a foundation for developing AI-assisted tools to improve reading comprehension.

The paper utilizes the Zurich Cognitive Language Processing Corpus (ZuCo) 1.0 dataset, specifically focusing on Task-Specific Reading (TSR) data where subjects answer questions related to the text. The core idea is to train a model to predict whether a word is highly relevant (HRW) or lowly relevant (LRW) to an inference task, using labels generated by powerful LLMs (GPT-3.5 Turbo and GPT-4) as a form of "fuzzy ground truth". This LLM-guided labeling process is validated by demonstrating that BERT word embeddings alone can predict these labels with high accuracy (92.7%).

For practical implementation, the paper details the processing pipeline:

Word Embedding: Each word/token in the sentence is processed using a pre-trained BERT model. The hidden state from the second-to-last layer (dimension 768) is extracted and L2 normalized to represent the word's semantic meaning within its context. Padding is applied for sentences shorter than the maximum length.
Biomarker Feature Extraction:
- Eye-gaze: 12 distinct features are extracted per word, including various fixation durations, total reading time, gaze duration, and pupil size metrics. These features are L1 normalized within each sentence.
- EEG: Features are extracted using the conditional entropy method, resulting in a 5460-dimensional vector per word.
Handling Multiple/Zero Fixations: For words with multiple fixations, the corresponding eye-gaze and EEG vectors are processed by taking their L2 norm and then performing element-wise addition across all fixations for that word. If a word has no fixation data, zero vectors are assigned.
Reading-Embedding Model Architecture:
- Both the processed eye-gaze and EEG features are linearly projected into a common lower-dimensional space (128 dimensions).
- These projected features are combined using element-wise addition.
- Sinusoidal positional encoding is applied to the combined features.
- The combined features are then fed into a single attention-based transformer encoder block.
- An MLP layer follows the transformer to output a probability for binary classification (HRW/LRW).
Training: The model is trained using a combined loss function consisting of Masked Binary Cross Entropy, Masked Mean Squared Error, and Masked Soft F1 Loss. The losses are weighted equally ( $\lambda_i = 1$ ). Stochastic Gradient Descent (SGD) with a learning rate of 0.05 is used for optimization. To handle class imbalance, LRW samples are downsampled during training and testing to match the number of HRW samples. Evaluation is performed using 5-fold cross-validation applied individually to the data of each subject.

The results demonstrate that integrating EEG and eye-gaze data and processing them through the transformer model achieves better accuracy (average 68.7%, max 71.2%) in classifying HRW vs LRW words compared to using either modality alone or using simpler linear classifiers like SVM. This indicates that the multi-modal approach combined with the transformer's ability to capture relationships within the sequence enhances the prediction of relevance based on human reading patterns.

The practical implications of this research lie in its potential to power future brain-computer interface (BCI) applications for reading assistance. By identifying words that correlate with signs of cognitive difficulty or lack of attention (potentially when human reading patterns diverge from the LLM-predicted relevance), an assistive tool could provide real-time feedback or interventions. The use of readily available datasets like ZuCo and the implementation details provided (including code availability) make this a practical step towards developing such tools. Future work is suggested to apply this method to reading tasks where subjects exhibit lower comprehension, as these scenarios are where assistance would be most valuable.

PDF Markdown

Related Papers

Find Related Papers