Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MovieCLIP: Visual Scene Recognition in Movies (2210.11065v2)

Published 20 Oct 2022 in cs.CV, cs.CL, and cs.MM

Abstract: Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes. Domain specific challenges associated with visual scenes in movies include transitions, person coverage, and a wide array of real-life and fictional scenarios. Existing visual scene datasets in movies have limited taxonomies and don't consider the visual scene transition within movie clips. In this work, we address the problem of visual scene recognition in movies by first automatically curating a new and extensive movie-centric taxonomy of 179 scene labels derived from movie scripts and auxiliary web-based video datasets. Instead of manual annotations which can be expensive, we use CLIP to weakly label 1.12 million shots from 32K movie clips based on our proposed taxonomy. We provide baseline visual models trained on the weakly labeled dataset called MovieCLIP and evaluate them on an independent dataset verified by human raters. We show that leveraging features from models pretrained on MovieCLIP benefits downstream tasks such as multi-label scene and genre classification of web videos and movie trailers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Digbalay Bose (14 papers)
  2. Rajat Hebbar (12 papers)
  3. Krishna Somandepalli (21 papers)
  4. Haoyang Zhang (28 papers)
  5. Yin Cui (45 papers)
  6. Kree Cole-McLaughlin (1 paper)
  7. Huisheng Wang (18 papers)
  8. Shrikanth Narayanan (151 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.