Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Select and Summarize: Scene Saliency for Movie Script Summarization (2404.03561v1)

Published 4 Apr 2024 in cs.CL

Abstract: Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current LLMs. A movie script typically comprises a large number of scenes; however, only a fraction of these scenes are salient, i.e., important for understanding the overall narrative. The salience of a scene can be operationalized by considering it as salient if it is mentioned in the summary. Automatically identifying salient scenes is difficult due to the lack of suitable datasets. In this work, we introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes. Using QA-based evaluation, we show that our model outperforms previous state-of-the-art summarization methods and reflects the information content of a movie more accurately than a model that takes the whole movie script as input.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. CREATIVESUMM: Shared task on automatic summarization for creative writing. In Proceedings of The Workshop on Automatic Summarization for Creative Writing, pages 67–73, Gyeongju, Republic of Korea. Association for Computational Linguistics.
  2. Longformer: The long-document transformer. arXiv:2004.05150.
  3. Unlimiformer: Long-range transformers with unlimited length input. In Advances in Neural Information Processing Systems, volume 36, pages 35522–35543. Curran Associates, Inc.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  5. SummScreen: A dataset for abstractive screenplay summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8602–8615, Dublin, Ireland. Association for Computational Linguistics.
  6. Yen-Chun Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 675–686, Melbourne, Australia. Association for Computational Linguistics.
  7. Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 484–494, Berlin, Germany. Association for Computational Linguistics.
  8. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  9. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 615–621, New Orleans, Louisiana. Association for Computational Linguistics.
  10. Towards question-answering as an automatic metric for evaluating the content quality of a summary. Transactions of the Association for Computational Linguistics, 9:774–789.
  11. GSum: A general framework for guided neural abstractive summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4830–4842, Online. Association for Computational Linguistics.
  12. Summary-source proposition-level alignment: Task, datasets and supervised baseline. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 310–322. Association for Computational Linguistics.
  13. QAFactEval: Improved QA-based factual consistency evaluation for summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2587–2601, Seattle, United States. Association for Computational Linguistics.
  14. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4098–4109, Brussels, Belgium. Association for Computational Linguistics.
  15. Philip John Gorinski and Mirella Lapata. 2015. Movie script summarization as graph-based scene extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1066–1076, Denver, Colorado. Association for Computational Linguistics.
  16. Philip John Gorinski and Mirella Lapata. 2018. What’s this movie about? a joint neural network architecture for movie content analysis. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1770–1781, New Orleans, Louisiana. Association for Computational Linguistics.
  17. Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419–1436, Online. Association for Computational Linguistics.
  18. Efficient Long-Text Understanding with Short-Text Models. Transactions of the Association for Computational Linguistics, 11:284–299.
  19. BOOKSUM: A collection of datasets for long-form narrative summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6536–6558, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  20. Exploring content selection in summarization of novel chapters. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5043–5054, Online. Association for Computational Linguistics.
  21. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  22. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  23. Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12:157–173.
  24. Long text and multi-table summarization: Dataset and method. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1995–2010, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  25. Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3730–3740, Hong Kong, China. Association for Computational Linguistics.
  26. Potsawee Manakul and Mark Gales. 2021. Long-span summarization via local attention and content selection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6026–6041, Online. Association for Computational Linguistics.
  27. Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain. Association for Computational Linguistics.
  28. AligNarr: Aligning narratives on movies. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 427–433, Online. Association for Computational Linguistics.
  29. Screenplay summarization using latent narrative structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1920–1933, Online. Association for Computational Linguistics.
  30. Movie plot analysis via turning point identification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1707–1717, Hong Kong, China. Association for Computational Linguistics.
  31. Movie summarization via sparse graph construction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13631–13639.
  32. Investigating efficiently extending transformers for long input summarization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3946–3961, Singapore. Association for Computational Linguistics.
  33. Two-stage movie script summarization: An efficient method for low-resource long document summarization. In Proceedings of The Workshop on Automatic Summarization for Creative Writing, pages 57–66, Gyeongju, Republic of Korea. Association for Computational Linguistics.
  34. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  35. ZeroSCROLLS: A zero-shot benchmark for long text understanding. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7977–7989, Singapore. Association for Computational Linguistics.
  36. SCROLLS: Standardized CompaRison over long language sequences. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 12007–12021, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  37. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  38. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  39. Salience allocation as guidance for abstractive summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6094–6106, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  40. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  41. Improving abstractive document summarization with salient information modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2132–2141, Florence, Italy. Association for Computational Linguistics.
  42. Big bird: Transformers for longer sequences. In Advances in Neural Information Processing Systems, volume 33, pages 17283–17297. Curran Associates, Inc.
  43. PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 11328–11339. PMLR.
  44. BERTScore: evaluating text generation with BERT. In International Conference on Learning Representations.
  45. Summn𝑛{}^{n}start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT: A multi-stage summarization framework for long input dialogues and documents. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1592–1604, Dublin, Ireland. Association for Computational Linguistics.
  46. Hao Zheng and Mirella Lapata. 2019. Sentence centrality revisited for unsupervised summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6236–6247, Florence, Italy. Association for Computational Linguistics.
  47. Judging llm-as-a-judge with mt-bench and chatbot arena. In Advances in Neural Information Processing Systems, volume 36, pages 46595–46623. Curran Associates, Inc.
  48. Dialoglm: Pre-trained model for long dialogue understanding and summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11765–11773.
  49. QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5905–5921, Online. Association for Computational Linguistics.
  50. MediaSum: A large-scale media interview dataset for dialogue summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online. Association for Computational Linguistics.
  51. Leveraging lead bias for zero-shot abstractive news summarization. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 1462–1471, New York, NY, USA. Association for Computing Machinery.
Citations (1)

Summary

  • The paper presents a two-stage Select & Summarize approach that improves script summarization by first identifying human-annotated salient scenes.
  • The authors developed MENSA, a dataset of 100 movies aligning Wikipedia summary sentences with corresponding movie scenes for precise evaluation.
  • The supervised model outperforms benchmarks by yielding factually aligned summaries through focused scene selection and enhanced QA metrics.

Scene Saliency and Summarization in Movie Script Analysis

Introduction to Scene Saliency in Movies

The challenge of summarizing movie scripts is significantly more complex than other forms of text due to the length, narrative structure, and the intricate details present in such scripts. Movie scripts, comprising numerous scenes, only contain a subset that is pivotal to understanding the overarching narrative. The concept of scene saliency, defined by whether a scene is mentioned in a human-generated summary, plays a crucial role in identifying these key scenes. Despite its importance, automatically determining scene saliency is fraught with difficulties, primarily due to the absence of specialized datasets. Addressing this gap, the introduction of a dataset annotated with human-identified salient scenes marks a significant step forward. This new dataset underpins our two-stage approach to movie script summarization, which initially identifies salient scenes and subsequently generates summaries focusing solely on these scenes.

Scene Saliency Dataset and Its Implications

The creation of the MENSA (Movie ScENe SAliency) dataset presents a novel resource comprising 100 movies with manually aligned Wikipedia summary sentences to movie scenes. This meticulous human annotation effort was designed to facilitate the evaluation of existing and future scene saliency detection methods. The dataset showcases a diverse range of scenes and summary sentences, emphasizing the complex nature of scene saliency. A comprehensive evaluation on this dataset revealed the superior performance of a specialized alignment method, tailored for movie scripts, in identifying salient content. This finding underscores the necessity for approaches that appreciate the unique characteristics of movie scripts, distinct from other forms of narrative texts.

The Select & Summarize Approach

Leveraging the insights gained from the MENSA dataset, we proposed a supervised model for scene saliency classification, trained on a larger corpora with silver-standard labels. This model exhibited a remarkable ability to discern scene saliency, outperforming several benchmarks and setting a new standard in content selection for movie scripts. Building on this, our two-stage summarization process, which employs only the salient scenes identified by our model, demonstrated significant improvements over state-of-the-art methods in summarization metrics. Importantly, this approach also yielded summaries that were more factually aligned with the original scripts, as evidenced by enhanced performance in QA-based evaluation metrics. This indicates that focusing on salient scenes not only refines the summary but also preserves critical factual details.

Future Directions

The promising results achieved with the Select & Summarize approach open up several avenues for future research. Exploring the integration of the scene saliency classification and summarization stages could offer further improvements in efficiency and accuracy. Furthermore, the application of this methodology to other domains of long-form narrative texts, such as novels or plays, presents a compelling area of exploration. The MENSA dataset also holds potential for advancing studies in content selection strategies, extractive summarization, and the development of more nuanced models that can navigate the rich tapestry of narratives found in movie scripts.

Concluding Remarks

The intersection of scene saliency and movie script summarization offers a rich landscape for advancing our understanding and capabilities in text summarization. The contributions from this research, including the MENSA dataset and the Select & Summarize model, provide valuable resources and insights for the AI and NLP communities. By addressing the distinct challenges presented by movie scripts, this work not only enhances summarization techniques but also enriches our comprehension of narrative structures and saliency in storytelling.