Towards Automated Movie Trailer Generation (2404.03477v1)
Abstract: Movie trailers are an essential tool for promoting films and attracting audiences. However, the process of creating trailers can be time-consuming and expensive. To streamline this process, we propose an automatic trailer generation framework that generates plausible trailers from a full movie by automating shot selection and composition. Our approach draws inspiration from machine translation techniques and models the movies and trailers as sequences of shots, thus formulating the trailer generation problem as a sequence-to-sequence task. We introduce Trailer Generation Transformer (TGT), a deep-learning framework utilizing an encoder-decoder architecture. TGT movie encoder is tasked with contextualizing each movie shot representation via self-attention, while the autoregressive trailer decoder predicts the feature representation of the next trailer shot, accounting for the relevance of shots' temporal order in trailers. Our TGT significantly outperforms previous methods on a comprehensive suite of metrics.
- Imdb. https://www.imdb.com/. Accessed: March 15, 2023.
- Contrastive learning for unsupervised video highlight detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14042–14052, 2022.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Action movies segmentation and summarization based on tempo analysis. In Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pages 251–258, 2004.
- Summarizing videos with attention. In Computer Vision–ACCV 2018 Workshops: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers 14, pages 39–54. Springer, 2019.
- Plots to previews: Towards automatic movie preview retrieval using publicly available meta-data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 3205–3214, 2021.
- Collaborative noisy label cleaner: Learning scene-aware trailers for multi-modal highlight detection in movies. arXiv preprint arXiv:2303.14768, 2023.
- Creating summaries from user videos. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, pages 505–520. Springer, 2014.
- Smart trailer: Automatic generation of movie trailer using only subtitles. In 2018 First International Workshop on Deep and Representation Learning (IWDRL), pages 26–30. IEEE, 2018.
- Movienet: A holistic dataset for movie understanding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 709–727. Springer, 2020.
- Automatic trailer generation. In Proceedings of the 18th ACM international conference on Multimedia, pages 839–842, 2010.
- Automated production of tv program trailer using electronic program guide. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 49–56, 2007.
- Large-scale video summarization using web-image priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2698–2705, 2013.
- User preferences for automated curation of snackable content. In 26th International Conference on Intelligent User Interfaces, pages 270–274, 2021.
- Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, pages 707–710. Soviet Union, 1966.
- How local is the local diversity? reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization. In Proceedings of the European Conference on Computer Vision (ECCV), pages 151–167, 2018.
- Semi-supervised learning towards computerized generation of movie trailers. In 2015 IEEE International Conference on Systems, Man, and Cybernetics, pages 2990–2995. IEEE, 2015.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Story-driven summarization for egocentric video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2714–2721, 2013.
- Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 202–211, 2017.
- A semi-automatic approach for generating video trailers for learning pathways. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium: 23rd International Conference, AIED 2022, Durham, UK, July 27–31, 2022, Proceedings, Part II, pages 302–305. Springer, 2022.
- Clip-it! language-guided video summarization. Advances in Neural Information Processing Systems, 34:13988–14000, 2021.
- Tl; dw? summarizing instructional videos with task relevance and cross-modal saliency. In European Conference on Computer Vision, pages 540–557. Springer, 2022.
- Collaborative summarization of topic-related videos. In Proceedings of the IEEE Conference on computer vision and pattern recognition, pages 7083–7092, 2017.
- Sumgraph: Video summarization via recursive graph modeling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 647–663. Springer, 2020.
- Category-specific video summarization. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 540–555. Springer, 2014.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Video summarization by learning from unpaired data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7902–7911, 2019.
- Video summarization using fully convolutional sequence networks. In Proceedings of the European conference on computer vision (ECCV), pages 347–363, 2018.
- Automatically selecting shots for action movie trailers. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval, pages 231–238, 2006.
- Harnessing ai for augmenting creativity: Application to movie trailer creation. In Proceedings of the 25th ACM international conference on Multimedia, pages 1799–1808, 2017.
- Mad: A scalable dataset for language grounding in videos from movie audio descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5026–5035, 2022.
- Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5179–5187, 2015.
- Transnet v2: an effective deep network architecture for fast shot transition detection. arXiv preprint arXiv:2008.04838, 2020.
- Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Learning trailer moments in full-length movies with co-contrastive attention. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 300–316. Springer, 2020.
- Mh-detr: Video moment and highlight detection with cross-modal transformer. arXiv preprint arXiv:2305.00355, 2023.
- Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1059–1067, 2016a.
- Video summarization with long short-term memory. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pages 766–782. Springer, 2016b.
- Retrospective encoders for video summarization. In Proceedings of the European conference on computer vision (ECCV), pages 383–399, 2018.
- Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.