Charting trajectories of human thought using large language models (2509.14455v1)
Abstract: Language provides the most revealing window into the ways humans structure conceptual knowledge within cognitive maps. Harnessing this information has been difficult, given the challenge of reliably mapping words to mental concepts. Artificial Intelligence LLMs now offer unprecedented opportunities to revisit this challenge. LLMs represent words and phrases as high-dimensional numerical vectors that encode vast semantic knowledge. To harness this potential for cognitive science, we introduce VECTOR, a computational framework that aligns LLM representations with human cognitive map organisation. VECTOR casts a participant's verbal reports as a geometric trajectory through a cognitive map representation, revealing how thoughts flow from one idea to the next. Applying VECTOR to narratives generated by 1,100 participants, we show these trajectories have cognitively meaningful properties that predict paralinguistic behaviour (response times) and real-world communication patterns. We suggest our approach opens new avenues for understanding how humans dynamically organise and navigate conceptual knowledge in naturalistic settings.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper is about finding a new way to “see” how human thoughts move from one idea to the next using language. The authors build a tool called VECTOR that turns people’s stories into paths on a kind of mental map. By doing this, they can paper the flow of thoughts, not just the words themselves.
What questions did the researchers ask?
They wanted to know:
- Can we turn people’s spoken or written stories into a map that shows how their thoughts travel from idea to idea?
- Does this map match how our minds actually organize knowledge (what scientists call a “cognitive map”)?
- Can this map explain real behavior, like how long people pause between words, or how predictable someone’s storytelling style is?
- Is there shared “abstract” structure across very different stories, like a sense of “beginning-to-end,” that we can detect with AI?
How did they paper it?
The team asked 1,100 people to tell two kinds of stories: the Cinderella fairy tale and their typical daily routine. Then they turned each story into a “trajectory” (a path) through two kinds of spaces made from AI LLM representations.
To make this easier to understand, here’s the basic pipeline the authors used:
- Break stories into small idea units called “utterances”
- Each utterance is like one clear thought, such as “Cinderella lived with her stepsisters.”
- They confirmed utterance boundaries were meaningful because people tended to slow down at these points (longer pauses between words).
- Turn each utterance into a vector using a LLM
- An LLM represents meaning as a long list of numbers (like coordinates). Think of it as a “meaning fingerprint.”
- This first space is called “semantic space.” It’s very detailed but not tailored to the specific task (e.g., telling Cinderella).
- Translate semantic vectors into a task-aligned “schema space”
- A “schema” is a simplified outline of how events usually unfold in a situation (like the typical order of events in a fairy tale or a daily routine).
- The authors trained a simple classifier (a statistical model) to guess which story event an utterance belongs to (e.g., “invitation to the ball,” “midnight,” “glass slipper”). It outputs probabilities for each event, producing a short, human-interpretable vector.
- This schema space acts like a custom map with axes that line up with the task’s key events, so nearby points are nearby “concepts” in the story, not just similar words.
- Analyze the trajectory (the path of utterances through the space)
- Alignment: Do different people’s paths follow the same route on the map?
- Momentum: Does the path move steadily in a direction (like progressing from beginning to end)?
- Jumpiness: Are there mostly small steps with occasional big leaps (like “flights and perchings” in thought)?
- Forward sequencing: Are forward moves (event 1 → event 2) more common than going backward?
- Test for abstract structure shared across different kinds of stories
- Cross-condition generalization: Train on Cinderella, test on daily routines (and vice versa). Does the model still find a consistent “forward” story flow?
- Temporal feature vector: Find a single direction in semantic space that runs from “start-like” to “end-like.” Use it to score or gently “nudge” utterances earlier or later in the story.
- Demixed PCA (a math tool): Pull out a shared “time” dimension across both tasks that explains where an utterance sits along the beginning-to-end arc.
- Connect trajectory features to real-world differences between people
- The authors looked at a trait they call “eccentricity” (how unusual someone says they communicate, based on a questionnaire).
- They checked if people with higher eccentricity tell stories with less predictable paths.
What did they find?
Here are the main takeaways:
- Schema space captures thought flow better than raw semantic space
- People’s pauses between words grew when they moved to a new utterance, especially when that jump was large in schema space (i.e., a big conceptual shift).
- Trajectories in schema space showed higher alignment (people tend to follow a shared route for the same story), stronger momentum (clearer progress through the story), and more “jumpiness” (small steps plus occasional big leaps) in a way that matches how thoughts often feel.
- The map is meaningful and event-aware
- In schema space, the order of events is visible: moving forward (event 1 → event 2) is much more likely than going backward.
- The classifier that creates schema space could accurately tag utterances with the right event, even for unseen stories.
- Abstract “story time” is shared across very different narratives
- Models trained on Cinderella could detect forward story flow in daily routine stories, and vice versa. This also worked on a large set of AI-generated stories.
- A single “start-to-end” direction in the underlying meaning space predicted where an utterance sits in the story and could “steer” event predictions by nudging utterances toward “start-like” or “end-like.”
- A shared temporal component (found with demixed PCA) explained meaningful variance across both tasks and generalized to new stories.
- Individual differences show up in these trajectories
- People who reported more unusual communication (higher eccentricity) had less predictable paths: lower alignment, lower momentum, weaker forward sequencing, and higher “trajectory entropy” (an AI-based measure of unpredictability).
- These links showed up in schema space but not in raw semantic space, suggesting the schema-aligned map picks up on deeper cognitive organization.
Why is this important?
- A new window into thought: VECTOR helps turn messy, natural language into a readable map of ideas. This lets scientists paper how thoughts unfold in everyday settings, not just in simplified lab tasks.
- Better tools for psychology and neuroscience: Because schema space aligns with human task structure, it can reveal mental event boundaries, shared story structure, and individual differences with behavioral relevance.
- Practical uses in mental health: The method could help understand communication patterns in psychiatric conditions, moving beyond surface word statistics toward deeper “concept flow.”
- Bridges AI and human cognition: The work shows that LLMs contain rich, decodable features of meaning, and with the right transformations, these can line up with how people organize knowledge.
- Generalization and abstraction: Finding shared “beginning-to-end” structure across different stories suggests our minds (and LLMs) rely on abstract templates that help us understand and create narratives efficiently.
A simple bottom line
Think of every story you tell as a walk through an invisible mental map. This paper shows how to draw that map from your words, watch your path, and learn about how your mind organizes ideas—how fast you move, when you pause, and how your personal style shapes the journey.
Knowledge Gaps
Below is a concise, actionable list of the key knowledge gaps, limitations, and open questions that remain unresolved in the paper.
- Dependence on constrained, well-known schemas: VECTOR is demonstrated on narratives (Cinderella, daily routine) with simple, conserved event structures; it is unclear how the approach scales to tasks with unknown, variable, branching, or weakly structured schemas.
- Ground-truth labeling circularity: Event identification and “ground truth” labels rely heavily on LLM-based autoraters and LLM-guided procedures; rigorous validation against expert human annotations is needed to rule out model–model circularity.
- Segmentation validity and generalizability: The LLM-based utterance segmentation requires benchmarking against human-labeled mental-event boundaries and evaluation across genres (dialogue, argumentative text, free association), modalities (spoken vs typed), and languages.
- Psycholinguistic controls for RT analyses: The link between schema-space distance and inter-word RTs needs stronger controls for lexical frequency, word length, syntax, and motor factors; replication with speech pauses, eye-tracking, or keystroke dynamics is needed.
- Model dependence and reproducibility: Results depend on specific proprietary embeddings (text-embeddings-3) and GPT variants; robustness across open-source embeddings, instruction-tuned vs base models, and model version drift remains untested.
- Event granularity and dimensionality choices: How the number and granularity of schema events (8D/11D) affect performance, interpretability, and overfitting is not systematically characterized; methods for data-driven selection of event dimensionality are needed.
- Alternative transformations and baselines: While two alternatives are sketched (prompt-based contextualization, topic modeling), a systematic benchmark across a broader suite of methods (e.g., contrastive learning, supervised fine-tuning, non-linear decoders) and metrics is lacking.
- Individualized cognitive maps: The framework infers shared, condition-level schema spaces; it remains open how to infer person-specific schema spaces and how much idiosyncratic structure exists across individuals.
- Hierarchical and compositional structure: The current approach captures linear event sequences; detecting hierarchical sub-events, optional branches, or compositional reuse of substructures is an outstanding challenge.
- Non-linear and non-chronological narratives: Forward sequencing may reflect chronological retellings; performance on non-linear stories, flashbacks, or tasks where optimal organization is non-temporal (e.g., thematic, causal) is unknown.
- Generalization beyond narratives: It is unclear how VECTOR performs for reasoning, planning, problem-solving, creativity, or dialogue where latent states are not narrative “events.”
- Cross-linguistic and cross-cultural validity: The approach is evaluated in English with culture-specific content (Cinderella); transfer to other languages, scripts, and culturally distinct schemas is an open question.
- Clinical and translational scope: Associations with “eccentricity” are suggestive; testing predictive validity in clinical cohorts (e.g., schizophrenia, ASD, mood disorders), longitudinal stability, sensitivity to treatment, and diagnostic specificity remains to be done.
- Causal manipulations: Observational correlates do not establish causation; experiments varying instructions, cognitive load, memory prompts, or schema cues are needed to test whether trajectory properties can be causally shifted.
- Neural validation: Direct links to brain data are not established; testing whether schema-space distances, jumps, and event boundaries align with neural state transitions (fMRI/MEG/EEG) and hippocampal–prefrontal dynamics is a key open direction.
- Robustness of “jumpiness” signature: The heavy-tailed step-size finding could be sensitive to segmentation granularity, smoothing parameters, and trajectory dimensionality; alternative null models and multi-scale analyses are needed.
- Metric design and parameter sensitivity: Alignment, momentum, and sequencing are computed in reduced spaces (e.g., 2D PCA); sensitivity to dimensionality reduction choices, distance metrics, and trajectory length normalization should be quantified.
- Abstract feature discovery beyond time: The temporal feature vector and dPCA subspace are compelling; methods to discover and steer other abstract dimensions (e.g., causality, agency, valence, intentionality) and assess their generality remain unexplored.
- External validation datasets: External tests use TinyStories (LLM-generated); validation on large human corpora (e.g., novels, oral histories, clinical interviews) is needed to ensure ecological validity.
- Trajectory entropy circularity: Predictability estimated via GPT-4-based next-event distributions may embed model priors; cross-model and human-predictability baselines (e.g., human next-utterance judgments) would reduce circularity.
- Handling sparse or incomplete narratives: How VECTOR performs with partial stories, low topic coverage, or minimal prior knowledge of the schema (e.g., unfamiliar fairy tales) is unclear.
- Online and real-time decoding: Feasibility of tracking thought dynamics in real time (live speech, streaming text) with low-latency segmentation and decoding is untested.
- Privacy, fairness, and bias: Applying VECTOR to sensitive psychiatric or demographic groups requires audits for bias in embeddings, fairness across subpopulations, and robust privacy-preserving pipelines.
- Open science and reproducibility: Public release of code, labels, and model checkpoints, plus preregistered replication analyses across labs, would clarify generalizability and mitigate dependence on proprietary APIs.
Glossary
- aliasing: In representational modeling, different latent states produce similar observable signals, making them hard to distinguish. "state splitting and aliasing"
- cognitive maps: Internal structured representations that organize knowledge to support inference and navigation through concepts. "structured internal representations, known as cognitive maps, that support inference, prediction and reasoning"
- Concept Decoding: A supervised transformation mapping LLM embeddings into task-aligned, interpretable schema coordinates. "A Concept Decoding step overcomes this limitation"
- concept vector projections: Transformations that project representations onto specific feature directions to isolate or emphasize conceptual attributes. "these transformations include concept vector projections"
- condition-invariant semantic space: A general embedding space shared across tasks that does not encode implicit contextual information. "a condition-invariant semantic space (1536D)"
- cross-condition generalisation (CCGP): The ability of decoders trained in one task context to uncover structured patterns in a different context. "as evidence of cross-condition generalisation"
- demixed Principal Component Analysis (dPCA): A dimensionality reduction method that separates shared factors (e.g., temporal) from condition-specific variance. "we used demixed Principal Component Analysis (dPCA)"
- Discrete State Sequencing: Analysis of ordered transitions between discrete schema events revealing directional structure. "Discrete State Sequencing."
- feature vector: A direction in embedding space that captures an interpretable feature (e.g., temporality) for projection or manipulation. "A feature vector in semantic space that captures abstracted temporal information"
- flow fields: Vector fields summarizing average trajectory displacements across representational space. "Flow fields."
- forward sequencing: A directional bias where transitions from earlier to later events are more probable than the reverse. "we recovered an expected signature of forward sequencing"
- foundation models: Large pretrained models whose representations encode broad knowledge useful across tasks. "thus constituting powerful foundation models of human cognitive map organisation"
- hippocampal-entorhinal circuits: Brain systems thought to encode relational world models and support cognitive map representations. "such as hippocampal-entorhinal circuits"
- hyperparameter selection: The choice of modeling parameters that can strongly affect unsupervised method outcomes. "sensitivity to experimenter choices in hyperparameter selection"
- linear mixed model: A statistical model that accounts for both fixed effects and random effects across subjects or items. "linear mixed model effect of utterance boundaries on RT = 0.54"
- logistic regression (regularized): A classification model with penalty terms to prevent overfitting and yield sparse, interpretable decoders. "we trained regularized logistic regression models"
- mechanistic interpretability: The paper of how internal features and circuits in AI models implement abstractions. "the field of AI mechanistic interpretability"
- medial prefrontal cortex: A brain region implicated in task-specific transformation of general representations. "such as medial prefrontal cortex"
- non-Markovian: Dependent on the full history rather than only the current state, in contrast to Markov processes. "i.e., a non-Markovian measure"
- observation functions: Mappings from latent states to observable outputs used in modeling partially observed processes. "observation functions used when modelling behaviour operating on partially observable states"
- out-of-distribution: Data from a distribution not seen during training used to assess generalization. "out-of-distribution test sets"
- partially observable states: Situations where the true underlying state cannot be directly observed. "modelling behaviour operating on partially observable states"
- permutation-derived null distribution: A baseline distribution generated by shuffling labels or alignments to test significance. "vs. a permutation-derived null distribution"
- permutation test: A non-parametric test assessing significance by comparing to metrics computed under random permutations. "permutation test using random decoder projections"
- radial histogram: A circular histogram showing the distribution of trajectory directions around event centroids. "and associated radial histogram"
- Representational Similarity Analysis (RSA): A method comparing representational structures across models or modalities via similarity matrices. "Representational Similarity Analysis (RSA)"
- representation similarity matrices: Matrices of pairwise similarities used to visualize and compare representational structure. "Representation similarity matrices display the cosine similarity between all utterance pairs"
- representational geometry: The spatial arrangement and distances among embedded representations reflecting structure. "a quantitative comparison of representational geometry"
- schema: Abstracted knowledge structures that encode event regularities in a specific context. "schemas as abstracted knowledge structures that capture information about how events unfold in a specific context"
- schema event decoders: Classifiers that map embeddings to probabilities over discrete schema events. "Schema event decoders exhibited high cross-validated decoding accuracy"
- schema space: A low-dimensional, sparse, interpretable space aligned to task-specific schema events. "transforming LLM semantic space representations to condition-specific schema space"
- steering vectors: Feature directions used to systematically shift a representation to modulate decoder outputs. "feature dimensions can be used as steering vectors"
- state splitting: When one latent concept maps to multiple observable expressions, creating multiple observed states. "state splitting and aliasing"
- temporality score: A scalar capturing an utterance’s position along a start-to-end feature direction. "Projecting each narrative utterance onto this vector yields a temporality score"
- trajectory alignment: The degree to which different trajectories follow a common path in representation space. "A metric of trajectory alignment quantifies the degree to which a given trajectory's transitions can be predicted"
- trajectory entropy: Uncertainty in predicting the next event given the narrative history; higher means less predictable trajectories. "Trajectory entropy (mean of entropy values over all narrative utterances) was positively correlated with eccentricity"
- trajectory jumpiness: A pattern of mostly small steps punctuated by occasional large jumps in representational space. "Trajectories exhibited significant jumpiness in both schema and semantic spaces"
- trajectory momentum: Directional progression through space over time, indicating linear schematic structure. "A metric of trajectory momentum, quantifies directional progression through space as a function of time"
- unsupervised topic modelling: Methods that infer latent thematic structure from text without labeled supervision. "unsupervised topic modelling"
- Vector Embedding: The step mapping utterances to a high-dimensional semantic space via pretrained LLM embeddings. "Vector Embedding maps each utterance to a domain-general semantic space"
- vector perturbation: Modifying embeddings by adding or subtracting a feature vector to shift encoded attributes. "vector perturbation"
Collections
Sign up for free to add this paper to one or more collections.