Select and Summarize: Scene Saliency for Movie Script Summarization (2404.03561v1)

Published 4 Apr 2024 in cs.CL

Abstract: Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current LLMs. A movie script typically comprises a large number of scenes; however, only a fraction of these scenes are salient, i.e., important for understanding the overall narrative. The salience of a scene can be operationalized by considering it as salient if it is mentioned in the summary. Automatically identifying salient scenes is difficult due to the lack of suitable datasets. In this work, we introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes. Using QA-based evaluation, we show that our model outperforms previous state-of-the-art summarization methods and reflects the information content of a movie more accurately than a model that takes the whole movie script as input.

References (51)

Citations (1)

View on Semantic Scholar

Summary

The paper presents a two-stage Select & Summarize approach that improves script summarization by first identifying human-annotated salient scenes.
The authors developed MENSA, a dataset of 100 movies aligning Wikipedia summary sentences with corresponding movie scenes for precise evaluation.
The supervised model outperforms benchmarks by yielding factually aligned summaries through focused scene selection and enhanced QA metrics.

Scene Saliency and Summarization in Movie Script Analysis

Introduction to Scene Saliency in Movies

The challenge of summarizing movie scripts is significantly more complex than other forms of text due to the length, narrative structure, and the intricate details present in such scripts. Movie scripts, comprising numerous scenes, only contain a subset that is pivotal to understanding the overarching narrative. The concept of scene saliency, defined by whether a scene is mentioned in a human-generated summary, plays a crucial role in identifying these key scenes. Despite its importance, automatically determining scene saliency is fraught with difficulties, primarily due to the absence of specialized datasets. Addressing this gap, the introduction of a dataset annotated with human-identified salient scenes marks a significant step forward. This new dataset underpins our two-stage approach to movie script summarization, which initially identifies salient scenes and subsequently generates summaries focusing solely on these scenes.

Scene Saliency Dataset and Its Implications

The creation of the MENSA (Movie ScENe SAliency) dataset presents a novel resource comprising 100 movies with manually aligned Wikipedia summary sentences to movie scenes. This meticulous human annotation effort was designed to facilitate the evaluation of existing and future scene saliency detection methods. The dataset showcases a diverse range of scenes and summary sentences, emphasizing the complex nature of scene saliency. A comprehensive evaluation on this dataset revealed the superior performance of a specialized alignment method, tailored for movie scripts, in identifying salient content. This finding underscores the necessity for approaches that appreciate the unique characteristics of movie scripts, distinct from other forms of narrative texts.

The Select & Summarize Approach

Leveraging the insights gained from the MENSA dataset, we proposed a supervised model for scene saliency classification, trained on a larger corpora with silver-standard labels. This model exhibited a remarkable ability to discern scene saliency, outperforming several benchmarks and setting a new standard in content selection for movie scripts. Building on this, our two-stage summarization process, which employs only the salient scenes identified by our model, demonstrated significant improvements over state-of-the-art methods in summarization metrics. Importantly, this approach also yielded summaries that were more factually aligned with the original scripts, as evidenced by enhanced performance in QA-based evaluation metrics. This indicates that focusing on salient scenes not only refines the summary but also preserves critical factual details.

Future Directions

The promising results achieved with the Select & Summarize approach open up several avenues for future research. Exploring the integration of the scene saliency classification and summarization stages could offer further improvements in efficiency and accuracy. Furthermore, the application of this methodology to other domains of long-form narrative texts, such as novels or plays, presents a compelling area of exploration. The MENSA dataset also holds potential for advancing studies in content selection strategies, extractive summarization, and the development of more nuanced models that can navigate the rich tapestry of narratives found in movie scripts.

Concluding Remarks

The intersection of scene saliency and movie script summarization offers a rich landscape for advancing our understanding and capabilities in text summarization. The contributions from this research, including the MENSA dataset and the Select & Summarize model, provide valuable resources and insights for the AI and NLP communities. By addressing the distinct challenges presented by movie scripts, this work not only enhances summarization techniques but also enriches our comprehension of narrative structures and saliency in storytelling.

PDF Markdown

Tweets

https://twitter.com/rohit_saxena/status/1802327690306855128

https://twitter.com/rohit_saxena/status/1777380292744888744