Three Stage Narrative Analysis; Plot-Sentiment Breakdown, Structure Learning and Concept Detection

Published 14 Nov 2025 in cs.CL and cs.AI | (2511.11857v1)

Abstract: Story understanding and analysis have long been challenging areas within Natural Language Understanding. Automated narrative analysis requires deep computational semantic representations along with syntactic processing. Moreover, the large volume of narrative data demands automated semantic analysis and computational learning rather than manual analytical approaches. In this paper, we propose a framework that analyzes the sentiment arcs of movie scripts and performs extended analysis related to the context of the characters involved. The framework enables the extraction of high-level and low-level concepts conveyed through the narrative. Using dictionary-based sentiment analysis, our approach applies a custom lexicon built with the LabMTsimple storylab module. The custom lexicon is based on the Valence, Arousal, and Dominance scores from the NRC-VAD dataset. Furthermore, the framework advances the analysis by clustering similar sentiment plots using Wards hierarchical clustering technique. Experimental evaluation on a movie dataset shows that the resulting analysis is helpful to consumers and readers when selecting a narrative or story.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a modular narrative analysis framework that segments scripts for sentiment arc mapping using a custom VAD-based lexicon.
It employs hierarchical clustering and supervised text classification to detect narrative structure, validated on a dataset of 1,000 movie scripts.
The study outlines future directions with transformer models and semantic novelty metrics to enhance concept detection and overall narrative analytics.

Three-Stage Computational Narrative Analysis: Sentiment, Structure, and Concept Detection in Movie Scripts

Introduction

"Three Stage Narrative Analysis; Plot-Sentiment Breakdown, Structure Learning and Concept Detection" (2511.11857) presents a modular framework for automated narrative analysis of movie scripts. The approach integrates multidimensional sentiment trajectory modeling, narrative structure classification, and a pipeline toward computational high/low concept detection. By leveraging a custom VAD-based sentiment lexicon and unsupervised hierarchical clustering, the system uncovers common emotional arcs in film narratives, reflecting theoretical paradigms in narrative studies. The work bridges natural language understanding (NLU) and narrative theory, delivering empirical insights with implications for both computational narrative analysis and creative industry workflows.

Methodology

The system architecture decomposes narrative analysis into three interdependent tiers: Plot-Sentiment Breakdown, Structure Learning, and Concept Detection.

Plot-Sentiment Breakdown

The first stage involves segmenting the script and generating sentiment arcs using a custom lexicon, based on the NRC VAD (Valence, Arousal, Dominance) scores, adapted for LabMT sentiment scoring. Sentiment values are aggregated over rolling windows to generate a smooth plot capturing emotional movement through the narrative (Figure 1).

Figure 1: Plot-Sentiment Breakdown for determining the Story Arc.

Scripts are partitioned into fixed-size segments. For each, text frequency vectors are generated and aggregated with contextual smoothing. This approach captures both the global sentiment trajectory and local emotional dynamics intrinsic to narrative progression.

Structure Learning

Narrative segments are further classified into functional categories (tension, punishment, reward, victory) using supervised text classification. The workflow encompasses training and inference pipelines (Figures 3 and 4).

Figure 2: Structure Learning and Text Classification: training procedure.

Figure 3: Structure Learning and Text Classification: testing procedure.

Accurate boundary detection and the assignment of segment-level structural roles lays the foundation for downstream analysis of narrative mechanism and supports applications in script editing and recommendation.

Concept Detection

The conceptual analysis tier distinguishes between high concept (e.g., archetypal, universal premises) and low concept (e.g., character-driven, nuanced) stories. The paper identifies the theoretical framework but implementation of an automated detection module is deferred to future work. Such modeling would utilize semantic features, event structure analysis, and novelty detection to operationalize concept classification.

Experimental Results

Dataset and Clustering

Experiments leverage a 1,000-script dataset, spanning diverse genres. Sentiment arcs (trajectories of emotional scores over segments) provide the input for Ward’s hierarchical clustering, revealing three macroscale clusters and numerous subclusters of scripts with convergent emotional shapes (Figure 4).

Figure 4: Dendrogram showing hierarchical clustering of 1,000 movie scripts using Ward’s linkage; three primary clusters of emotional trajectories identified.

Sentiment Pattern Analysis

Individual and cluster-level sentiment plots validate the existence of recurring narrative arcs—rising and falling shapes associated with canonical story forms. Representative examples (The Avengers, Blade Runner, The Revenant) illustrate both the cross-genre consistency of arc patterns and intra-cluster variability (Figures 6–8).

Figure 5: Sentiment plot for The Avengers, showcasing peaks and troughs aligned with narrative conflicts and resolutions.

Figure 6: Sentiment plot for Blade Runner, highlighting the alternation of tension and reflection, consistent with universal story arc hypotheses.

Figure 7: Sentiment plot for The Revenant, exhibiting a rise-fall-rise structure—despair, struggle, recovery.

Averaging across scripts within clusters suppresses local fluctuations, exposing canonical arc shapes such as the "rags-to-riches" and "Icarus" (tragedy) arcs (Figures 9–10).

Figure 8: Combined sentiment plot for Cluster 40, typifying “rags-to-riches” upward emotional arcs.

Figure 9: Combined sentiment plot for Cluster 67 with mixed rise and decline, corresponding to “Icarus”/tragedy-type arcs.

Analytical Insights

Clustered emotional shapes validate the existence of a limited number of universal story arcs, resonant with Vonnegut’s and Reagan et al.'s taxonomies.
Scripts from disparate genres may express similar emotional dynamics, reinforcing the independence of arc shapes from narrative type.
Experimental limitations arise from segmentation noise and script-length variability. The paper suggests frequency-domain smoothing (Fourier analysis) to suppress non-informative variance and hybrid clustering (combining Ward’s method and SOMs) to preserve topological relations in trajectory space.

Theoretical and Practical Implications

On the computational front, the approach leverages advances in lexicon construction, distributional representation, and hierarchical clustering. The adoption of VAD-based sentiment scoring over traditional polarity lexicons allows for richer, multidimensional arc discovery. The structure learning module sets the stage for graph-based analysis or transformer-based sequence modeling.

From an applied perspective, these methods have direct implications for media analytics, script development pipelines, and recommendation systems. Emotional arc transparency, coupled with structure and (eventually) concept type assignment, enables new axes of content organization—beyond surface genre—aligned with affective or cognitive expectations of audiences.

Limitations and Future Directions

Sentiment Analysis: Lexicon-based methods lack context sensitivity and nuanced affect detection compared to contextual embedding-based models (BERT, RoBERTa).
Structural and Conceptual Detection: Current structure classification assumes fixed role sets and does not model hybrid/ambiguous units. Concept detection is not operationalized.
Segmentation and Length Normalization: Script segmentation is fixed; adaptive, content-aware windowing and length normalization could enhance inter-script comparability.

The paper identifies the integration of transformer-based contextual models for sentiment dynamics, GNNs for structural role learning, and semantic novelty metrics for concept detection as priority future directions. Multilingual, cross-cultural narrative datasets are also proposed for extending the framework’s scope.

Conclusion

This work presents a rigorous, modular architecture for computational narrative analysis, advancing the field through the alignment of empirical sentiment arc mapping, structural role detection, and the theoretical groundwork for automated concept analysis. The clustering experiments provide empirical support for the persistence of universal emotional arcs in film. The outlined enhancements—embedding-based sentiment, hybrid clustering, and fully automated concept type inference—define a generative research roadmap that could enable comprehensive, machine-interpretable narrative analytics for both computational and creative domains.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Three Stage Narrative Analysis; Plot-Sentiment Breakdown, Structure Learning and Concept Detection

Summary

Three-Stage Computational Narrative Analysis: Sentiment, Structure, and Concept Detection in Movie Scripts

Introduction

Methodology

Plot-Sentiment Breakdown

Structure Learning

Concept Detection

Experimental Results

Dataset and Clustering

Sentiment Pattern Analysis

Analytical Insights

Theoretical and Practical Implications

Limitations and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Three Stage Narrative Analysis; Plot-Sentiment Breakdown, Structure Learning and Concept Detection

Summary

Three-Stage Computational Narrative Analysis: Sentiment, Structure, and Concept Detection in Movie Scripts

Introduction

Methodology

Plot-Sentiment Breakdown

Structure Learning

Concept Detection

Experimental Results

Dataset and Clustering

Sentiment Pattern Analysis

Analytical Insights

Theoretical and Practical Implications

Limitations and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections