Charting trajectories of human thought using large language models (2509.14455v1)

Published 17 Sep 2025 in q-bio.NC

Abstract: Language provides the most revealing window into the ways humans structure conceptual knowledge within cognitive maps. Harnessing this information has been difficult, given the challenge of reliably mapping words to mental concepts. Artificial Intelligence LLMs now offer unprecedented opportunities to revisit this challenge. LLMs represent words and phrases as high-dimensional numerical vectors that encode vast semantic knowledge. To harness this potential for cognitive science, we introduce VECTOR, a computational framework that aligns LLM representations with human cognitive map organisation. VECTOR casts a participant's verbal reports as a geometric trajectory through a cognitive map representation, revealing how thoughts flow from one idea to the next. Applying VECTOR to narratives generated by 1,100 participants, we show these trajectories have cognitively meaningful properties that predict paralinguistic behaviour (response times) and real-world communication patterns. We suggest our approach opens new avenues for understanding how humans dynamically organise and navigate conceptual knowledge in naturalistic settings.

Summary

The paper presents VECTOR, a framework that maps narrative data onto latent cognitive maps using LLM embeddings.
The paper details a multi-stage pipeline—including utterance segmentation, semantic embedding, and schema decoding—to quantify thought trajectory dynamics.
The paper demonstrates that schema space metrics can predict individual behavioral differences and psychiatric markers.

Charting Trajectories of Human Thought Using LLMs

Introduction

This paper presents VECTOR, a computational framework for mapping human verbal reports onto latent cognitive map representations using LLMs. The approach leverages high-dimensional semantic embeddings from LLMs and transforms them into low-dimensional, interpretable schema spaces that align with human conceptual organization. By analyzing narrative data from over 1,100 participants, the authors demonstrate that VECTOR can reveal the geometric and dynamic properties of thought trajectories, predict behavioral markers, and capture individual differences in communication style relevant to psychiatric assessment.

VECTOR Framework: Architecture and Implementation

Pipeline Overview

VECTOR consists of four main stages:

Utterance Segmentation: Narratives are parsed into utterances using a BERT-based LLM, with segmentation boundaries determined by stop-token probabilities and a dynamic programming path-finding algorithm. This step is sensitive to individual expressivity and narrative structure.
Semantic Embedding: Each utterance is embedded into a 1536-dimensional semantic space using OpenAI's text-embeddings-3-small model.
Concept Decoding: Semantic embeddings are projected into a low-dimensional schema space via supervised classification (lasso-regularized logistic regression), with schema events defined by consensus clustering of LLM-generated event lists and auto-labelling using GPT-4o-mini.
Trajectory Organization: Trajectories through semantic and schema spaces are quantified using metrics for alignment, momentum, jumpiness, and discrete state sequencing.

Implementation Details

Utterance Segmentation: The segmentation algorithm uses BERT to estimate break probabilities at each word, generating candidate utterances. Quality scores for candidates are computed via cosine similarity to a set of high-quality LLM-generated utterances. The optimal sequence is selected using a graph-based dynamic programming approach.
Concept Decoding: Schema events are identified by clustering LLM-generated event lists. Each utterance is labelled via an LLM-as-judge procedure, and classifiers are trained to map semantic embeddings to schema event probabilities. Cross-validation ensures generalization and robustness.
Trajectory Metrics: PCA is applied to both semantic and schema spaces for dimensionality reduction. Alignment is computed via Markovian transition matrices; momentum via regression of spatial vs. temporal displacement; jumpiness via comparison to smoothed null models; sequencing via joint probability matrices.

Example: Concept Decoding

from sklearn.linear_model import LogisticRegression
import numpy as np

clf = LogisticRegression(penalty='l1', solver='saga', multi_class='ovr', C=1)
clf.fit(X, y)
probs = clf.predict_proba(X_new)

Empirical Findings

Schema Space as a Cognitive Map Proxy

Geometric Properties: Schema space trajectories exhibit higher alignment and momentum than semantic space, indicating shared conceptual organization and directed progression through latent states.
Jumpiness: Trajectories in schema space show greater jumpiness, consistent with discrete transitions between attractor states in cognitive maps.
Sequencing: Forward event transitions are significantly more probable than backward transitions, reflecting the temporal structure of narratives.

Abstraction and Generalization

Cross-Condition Generalization: Schema decoders trained on one narrative type (e.g., Cinderella) generalize to others (e.g., Routine), capturing abstracted temporal structure.
Temporal Feature Vectors: Linear directions in semantic space encode temporal progression; projecting utterances onto these vectors yields temporality scores that correlate with narrative position and schema event.
Demixed PCA: Temporal subspaces extracted via dPCA explain significant variance and generalize to external datasets (e.g., TinyStories), confirming the presence of abstracted knowledge in LLM embeddings.

Individual Differences and Behavioral Prediction

Trait-Like Stability: Schema trajectory metrics (alignment, momentum, sequencing) are stable within individuals across tasks.
Prediction of Eccentricity: Lower trajectory alignment, momentum, and sequencing in schema space predict higher self-reported communication atypicality (eccentricity). Semantic space metrics do not show this association.
Trajectory Entropy: LLM-based measures of narrative predictability (trajectory entropy) correlate with eccentricity, providing a non-Markovian behavioral marker.
Abstraction Score: Reduced expression of abstracted temporal structure (dPCA variance) is associated with higher eccentricity.

Comparative Analysis: Alternative Approaches

Unsupervised Topic Modeling (BERTopic): While capable of identifying latent conceptual structure, BERTopic is highly sensitive to hyperparameter selection and generally yields lower predictive validity than VECTOR's supervised decoding.
Prompt-Based Contextualization: Augmenting utterances with contextual prompts improves representational geometry but remains inferior to schema space embeddings in predicting individual differences. Results are sensitive to prompt formulation.

Theoretical and Practical Implications

Cognitive Science and Neuroscience

VECTOR provides a principled method for mapping language onto latent cognitive maps, enabling quantitative paper of thought dynamics in naturalistic settings. The approach aligns with theories positing domain-general representations transformed by task-specific modules, paralleling neural mechanisms in hippocampal and prefrontal circuits.

AI Alignment and Interpretability

The framework demonstrates that LLM embeddings contain decodable, abstracted features relevant to human cognition. Targeted transformations (regression, feature projection, dPCA) can isolate these features, supporting mechanistic interpretability and concept alignment—critical for value alignment in AI systems.

Psychiatry and Clinical Applications

VECTOR's schema space metrics offer scalable, interpretable markers for psychiatric assessment, moving beyond surface linguistic analysis to probe deeper conceptual organization and thought disorder. The method is robust to confounds and generalizes across narrative types.

Limitations and Future Directions

Task Structure Dependence: VECTOR currently relies on well-defined schema event structures. Extension to unconstrained or poorly understood domains will require integration with unsupervised methods and robust external validation.
Between-Participant Variability: The framework can adjudicate between competing hypotheses about cognitive map structure, but further methodological development is needed for highly variable or idiosyncratic conceptual organizations.
Prompt Sensitivity: Contextualization via prompts is promising but lacks principled methods for prompt selection and optimization.

Conclusion

VECTOR establishes a rigorous computational approach for charting the trajectories of human thought using LLMs, bridging language, cognition, and behavior. By aligning LLM representations with human cognitive maps, the framework enables systematic investigation of conceptual organization, individual differences, and psychiatric variance in naturalistic language. The method has broad applicability across cognitive neuroscience, AI interpretability, and clinical psychiatry, and sets the stage for future research into the alignment of AI systems with human conceptual structure.

PDF Markdown

Whiteboard

Generate a whiteboard explanation of this paper.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper is about finding a new way to “see” how human thoughts move from one idea to the next using language. The authors build a tool called VECTOR that turns people’s stories into paths on a kind of mental map. By doing this, they can paper the flow of thoughts, not just the words themselves.

What questions did the researchers ask?

They wanted to know:

Can we turn people’s spoken or written stories into a map that shows how their thoughts travel from idea to idea?
Does this map match how our minds actually organize knowledge (what scientists call a “cognitive map”)?
Can this map explain real behavior, like how long people pause between words, or how predictable someone’s storytelling style is?
Is there shared “abstract” structure across very different stories, like a sense of “beginning-to-end,” that we can detect with AI?

How did they paper it?

The team asked 1,100 people to tell two kinds of stories: the Cinderella fairy tale and their typical daily routine. Then they turned each story into a “trajectory” (a path) through two kinds of spaces made from AI LLM representations.

To make this easier to understand, here’s the basic pipeline the authors used:

Break stories into small idea units called “utterances”
- Each utterance is like one clear thought, such as “Cinderella lived with her stepsisters.”
- They confirmed utterance boundaries were meaningful because people tended to slow down at these points (longer pauses between words).
Turn each utterance into a vector using a LLM
- An LLM represents meaning as a long list of numbers (like coordinates). Think of it as a “meaning fingerprint.”
- This first space is called “semantic space.” It’s very detailed but not tailored to the specific task (e.g., telling Cinderella).
Translate semantic vectors into a task-aligned “schema space”
- A “schema” is a simplified outline of how events usually unfold in a situation (like the typical order of events in a fairy tale or a daily routine).
- The authors trained a simple classifier (a statistical model) to guess which story event an utterance belongs to (e.g., “invitation to the ball,” “midnight,” “glass slipper”). It outputs probabilities for each event, producing a short, human-interpretable vector.
- This schema space acts like a custom map with axes that line up with the task’s key events, so nearby points are nearby “concepts” in the story, not just similar words.
Analyze the trajectory (the path of utterances through the space)
- Alignment: Do different people’s paths follow the same route on the map?
- Momentum: Does the path move steadily in a direction (like progressing from beginning to end)?
- Jumpiness: Are there mostly small steps with occasional big leaps (like “flights and perchings” in thought)?
- Forward sequencing: Are forward moves (event 1 → event 2) more common than going backward?
Test for abstract structure shared across different kinds of stories
- Cross-condition generalization: Train on Cinderella, test on daily routines (and vice versa). Does the model still find a consistent “forward” story flow?
- Temporal feature vector: Find a single direction in semantic space that runs from “start-like” to “end-like.” Use it to score or gently “nudge” utterances earlier or later in the story.
- Demixed PCA (a math tool): Pull out a shared “time” dimension across both tasks that explains where an utterance sits along the beginning-to-end arc.
Connect trajectory features to real-world differences between people
- The authors looked at a trait they call “eccentricity” (how unusual someone says they communicate, based on a questionnaire).
- They checked if people with higher eccentricity tell stories with less predictable paths.

What did they find?

Here are the main takeaways:

Schema space captures thought flow better than raw semantic space
- People’s pauses between words grew when they moved to a new utterance, especially when that jump was large in schema space (i.e., a big conceptual shift).
- Trajectories in schema space showed higher alignment (people tend to follow a shared route for the same story), stronger momentum (clearer progress through the story), and more “jumpiness” (small steps plus occasional big leaps) in a way that matches how thoughts often feel.
The map is meaningful and event-aware
- In schema space, the order of events is visible: moving forward (event 1 → event 2) is much more likely than going backward.
- The classifier that creates schema space could accurately tag utterances with the right event, even for unseen stories.
Abstract “story time” is shared across very different narratives
- Models trained on Cinderella could detect forward story flow in daily routine stories, and vice versa. This also worked on a large set of AI-generated stories.
- A single “start-to-end” direction in the underlying meaning space predicted where an utterance sits in the story and could “steer” event predictions by nudging utterances toward “start-like” or “end-like.”
- A shared temporal component (found with demixed PCA) explained meaningful variance across both tasks and generalized to new stories.
Individual differences show up in these trajectories
- People who reported more unusual communication (higher eccentricity) had less predictable paths: lower alignment, lower momentum, weaker forward sequencing, and higher “trajectory entropy” (an AI-based measure of unpredictability).
- These links showed up in schema space but not in raw semantic space, suggesting the schema-aligned map picks up on deeper cognitive organization.

Why is this important?

A new window into thought: VECTOR helps turn messy, natural language into a readable map of ideas. This lets scientists paper how thoughts unfold in everyday settings, not just in simplified lab tasks.
Better tools for psychology and neuroscience: Because schema space aligns with human task structure, it can reveal mental event boundaries, shared story structure, and individual differences with behavioral relevance.
Practical uses in mental health: The method could help understand communication patterns in psychiatric conditions, moving beyond surface word statistics toward deeper “concept flow.”
Bridges AI and human cognition: The work shows that LLMs contain rich, decodable features of meaning, and with the right transformations, these can line up with how people organize knowledge.
Generalization and abstraction: Finding shared “beginning-to-end” structure across different stories suggests our minds (and LLMs) rely on abstract templates that help us understand and create narratives efficiently.

A simple bottom line

Think of every story you tell as a walk through an invisible mental map. This paper shows how to draw that map from your words, watch your path, and learn about how your mind organizes ideas—how fast you move, when you pause, and how your personal style shapes the journey.

View Paper Prompt View All Prompts

Knowledge Gaps

Below is a concise, actionable list of the key knowledge gaps, limitations, and open questions that remain unresolved in the paper.

Dependence on constrained, well-known schemas: VECTOR is demonstrated on narratives (Cinderella, daily routine) with simple, conserved event structures; it is unclear how the approach scales to tasks with unknown, variable, branching, or weakly structured schemas.
Ground-truth labeling circularity: Event identification and “ground truth” labels rely heavily on LLM-based autoraters and LLM-guided procedures; rigorous validation against expert human annotations is needed to rule out model–model circularity.
Segmentation validity and generalizability: The LLM-based utterance segmentation requires benchmarking against human-labeled mental-event boundaries and evaluation across genres (dialogue, argumentative text, free association), modalities (spoken vs typed), and languages.
Psycholinguistic controls for RT analyses: The link between schema-space distance and inter-word RTs needs stronger controls for lexical frequency, word length, syntax, and motor factors; replication with speech pauses, eye-tracking, or keystroke dynamics is needed.
Model dependence and reproducibility: Results depend on specific proprietary embeddings (text-embeddings-3) and GPT variants; robustness across open-source embeddings, instruction-tuned vs base models, and model version drift remains untested.
Event granularity and dimensionality choices: How the number and granularity of schema events (8D/11D) affect performance, interpretability, and overfitting is not systematically characterized; methods for data-driven selection of event dimensionality are needed.
Alternative transformations and baselines: While two alternatives are sketched (prompt-based contextualization, topic modeling), a systematic benchmark across a broader suite of methods (e.g., contrastive learning, supervised fine-tuning, non-linear decoders) and metrics is lacking.
Individualized cognitive maps: The framework infers shared, condition-level schema spaces; it remains open how to infer person-specific schema spaces and how much idiosyncratic structure exists across individuals.
Hierarchical and compositional structure: The current approach captures linear event sequences; detecting hierarchical sub-events, optional branches, or compositional reuse of substructures is an outstanding challenge.
Non-linear and non-chronological narratives: Forward sequencing may reflect chronological retellings; performance on non-linear stories, flashbacks, or tasks where optimal organization is non-temporal (e.g., thematic, causal) is unknown.
Generalization beyond narratives: It is unclear how VECTOR performs for reasoning, planning, problem-solving, creativity, or dialogue where latent states are not narrative “events.”
Cross-linguistic and cross-cultural validity: The approach is evaluated in English with culture-specific content (Cinderella); transfer to other languages, scripts, and culturally distinct schemas is an open question.
Clinical and translational scope: Associations with “eccentricity” are suggestive; testing predictive validity in clinical cohorts (e.g., schizophrenia, ASD, mood disorders), longitudinal stability, sensitivity to treatment, and diagnostic specificity remains to be done.
Causal manipulations: Observational correlates do not establish causation; experiments varying instructions, cognitive load, memory prompts, or schema cues are needed to test whether trajectory properties can be causally shifted.
Neural validation: Direct links to brain data are not established; testing whether schema-space distances, jumps, and event boundaries align with neural state transitions (fMRI/MEG/EEG) and hippocampal–prefrontal dynamics is a key open direction.
Robustness of “jumpiness” signature: The heavy-tailed step-size finding could be sensitive to segmentation granularity, smoothing parameters, and trajectory dimensionality; alternative null models and multi-scale analyses are needed.
Metric design and parameter sensitivity: Alignment, momentum, and sequencing are computed in reduced spaces (e.g., 2D PCA); sensitivity to dimensionality reduction choices, distance metrics, and trajectory length normalization should be quantified.
Abstract feature discovery beyond time: The temporal feature vector and dPCA subspace are compelling; methods to discover and steer other abstract dimensions (e.g., causality, agency, valence, intentionality) and assess their generality remain unexplored.
External validation datasets: External tests use TinyStories (LLM-generated); validation on large human corpora (e.g., novels, oral histories, clinical interviews) is needed to ensure ecological validity.
Trajectory entropy circularity: Predictability estimated via GPT-4-based next-event distributions may embed model priors; cross-model and human-predictability baselines (e.g., human next-utterance judgments) would reduce circularity.
Handling sparse or incomplete narratives: How VECTOR performs with partial stories, low topic coverage, or minimal prior knowledge of the schema (e.g., unfamiliar fairy tales) is unclear.
Online and real-time decoding: Feasibility of tracking thought dynamics in real time (live speech, streaming text) with low-latency segmentation and decoding is untested.
Privacy, fairness, and bias: Applying VECTOR to sensitive psychiatric or demographic groups requires audits for bias in embeddings, fairness across subpopulations, and robust privacy-preserving pipelines.
Open science and reproducibility: Public release of code, labels, and model checkpoints, plus preregistered replication analyses across labs, would clarify generalizability and mitigate dependence on proprietary APIs.

View Paper Prompt View All Prompts

Glossary

aliasing: In representational modeling, different latent states produce similar observable signals, making them hard to distinguish. "state splitting and aliasing"
cognitive maps: Internal structured representations that organize knowledge to support inference and navigation through concepts. "structured internal representations, known as cognitive maps, that support inference, prediction and reasoning"
Concept Decoding: A supervised transformation mapping LLM embeddings into task-aligned, interpretable schema coordinates. "A Concept Decoding step overcomes this limitation"
concept vector projections: Transformations that project representations onto specific feature directions to isolate or emphasize conceptual attributes. "these transformations include concept vector projections"
condition-invariant semantic space: A general embedding space shared across tasks that does not encode implicit contextual information. "a condition-invariant semantic space (1536D)"
cross-condition generalisation (CCGP): The ability of decoders trained in one task context to uncover structured patterns in a different context. "as evidence of cross-condition generalisation"
demixed Principal Component Analysis (dPCA): A dimensionality reduction method that separates shared factors (e.g., temporal) from condition-specific variance. "we used demixed Principal Component Analysis (dPCA)"
Discrete State Sequencing: Analysis of ordered transitions between discrete schema events revealing directional structure. "Discrete State Sequencing."
feature vector: A direction in embedding space that captures an interpretable feature (e.g., temporality) for projection or manipulation. "A feature vector in semantic space that captures abstracted temporal information"
flow fields: Vector fields summarizing average trajectory displacements across representational space. "Flow fields."
forward sequencing: A directional bias where transitions from earlier to later events are more probable than the reverse. "we recovered an expected signature of forward sequencing"
foundation models: Large pretrained models whose representations encode broad knowledge useful across tasks. "thus constituting powerful foundation models of human cognitive map organisation"
hippocampal-entorhinal circuits: Brain systems thought to encode relational world models and support cognitive map representations. "such as hippocampal-entorhinal circuits"
hyperparameter selection: The choice of modeling parameters that can strongly affect unsupervised method outcomes. "sensitivity to experimenter choices in hyperparameter selection"
linear mixed model: A statistical model that accounts for both fixed effects and random effects across subjects or items. "linear mixed model effect of utterance boundaries on RT = 0.54"
logistic regression (regularized): A classification model with penalty terms to prevent overfitting and yield sparse, interpretable decoders. "we trained regularized logistic regression models"
mechanistic interpretability: The paper of how internal features and circuits in AI models implement abstractions. "the field of AI mechanistic interpretability"
medial prefrontal cortex: A brain region implicated in task-specific transformation of general representations. "such as medial prefrontal cortex"
non-Markovian: Dependent on the full history rather than only the current state, in contrast to Markov processes. "i.e., a non-Markovian measure"
observation functions: Mappings from latent states to observable outputs used in modeling partially observed processes. "observation functions used when modelling behaviour operating on partially observable states"
out-of-distribution: Data from a distribution not seen during training used to assess generalization. "out-of-distribution test sets"
partially observable states: Situations where the true underlying state cannot be directly observed. "modelling behaviour operating on partially observable states"
permutation-derived null distribution: A baseline distribution generated by shuffling labels or alignments to test significance. "vs. a permutation-derived null distribution"
permutation test: A non-parametric test assessing significance by comparing to metrics computed under random permutations. "permutation test using random decoder projections"
radial histogram: A circular histogram showing the distribution of trajectory directions around event centroids. "and associated radial histogram"
Representational Similarity Analysis (RSA): A method comparing representational structures across models or modalities via similarity matrices. "Representational Similarity Analysis (RSA)"
representation similarity matrices: Matrices of pairwise similarities used to visualize and compare representational structure. "Representation similarity matrices display the cosine similarity between all utterance pairs"
representational geometry: The spatial arrangement and distances among embedded representations reflecting structure. "a quantitative comparison of representational geometry"
schema: Abstracted knowledge structures that encode event regularities in a specific context. "schemas as abstracted knowledge structures that capture information about how events unfold in a specific context"
schema event decoders: Classifiers that map embeddings to probabilities over discrete schema events. "Schema event decoders exhibited high cross-validated decoding accuracy"
schema space: A low-dimensional, sparse, interpretable space aligned to task-specific schema events. "transforming LLM semantic space representations to condition-specific schema space"
steering vectors: Feature directions used to systematically shift a representation to modulate decoder outputs. "feature dimensions can be used as steering vectors"
state splitting: When one latent concept maps to multiple observable expressions, creating multiple observed states. "state splitting and aliasing"
temporality score: A scalar capturing an utterance’s position along a start-to-end feature direction. "Projecting each narrative utterance onto this vector yields a temporality score"
trajectory alignment: The degree to which different trajectories follow a common path in representation space. "A metric of trajectory alignment quantifies the degree to which a given trajectory's transitions can be predicted"
trajectory entropy: Uncertainty in predicting the next event given the narrative history; higher means less predictable trajectories. "Trajectory entropy (mean of entropy values over all narrative utterances) was positively correlated with eccentricity"
trajectory jumpiness: A pattern of mostly small steps punctuated by occasional large jumps in representational space. "Trajectories exhibited significant jumpiness in both schema and semantic spaces"
trajectory momentum: Directional progression through space over time, indicating linear schematic structure. "A metric of trajectory momentum, quantifies directional progression through space as a function of time"
unsupervised topic modelling: Methods that infer latent thematic structure from text without labeled supervision. "unsupervised topic modelling"
Vector Embedding: The step mapping utterances to a high-dimensional semantic space via pretrained LLM embeddings. "Vector Embedding maps each utterance to a domain-general semantic space"
vector perturbation: Modifying embeddings by adding or subtracting a feature vector to shift encoded attributes. "vector perturbation"

View Paper Prompt View All Prompts

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Authors (4)

Collections

Tweets

alphaXiv

Charting trajectories of human thought using large language models (12 likes, 0 questions)

Charting trajectories of human thought using large language models (2509.14455v1)

Sponsor

Summary

Charting Trajectories of Human Thought Using LLMs

Introduction

VECTOR Framework: Architecture and Implementation

Pipeline Overview

Implementation Details

Example: Concept Decoding

Empirical Findings

Schema Space as a Cognitive Map Proxy

Abstraction and Generalization

Individual Differences and Behavioral Prediction

Comparative Analysis: Alternative Approaches

Theoretical and Practical Implications

Cognitive Science and Neuroscience

AI Alignment and Interpretability

Psychiatry and Clinical Applications

Limitations and Future Directions

Conclusion

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions did the researchers ask?

How did they paper it?

What did they find?

Why is this important?

A simple bottom line

Knowledge Gaps

Glossary

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

Tweets

alphaXiv