Large Model based Sequential Keyframe Extraction for Video Summarization (2401.04962v1)
Abstract: Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual feature within each shot; Second, we develop an adaptive clustering algorithm to yield candidate keyframes for each shot, with each candidate keyframe locating nearest to a cluster center; Third, we further reduce the above candidate keyframes via redundancy elimination within each shot, and finally concatenate them in accordance with the sequence of shots as the final sequential keyframes. To evaluate LMSKE, we curate a benchmark dataset and conduct rich experiments, whose results exhibit that LMSKE performs much better than quite a few SOTA competitors with average F1 of 0.5311, average fidelity of 0.8141, and average compression ratio of 0.9922.
- “Unsupervised video hashing with multi-granularity contextualization and multi-structure preservation,” in ACM Multimedia, 2022, pp. 3754–3763.
- “Training language models to follow instructions with human feedback,” in NeurIPS, 2022, pp. 1–15.
- OpenAI, “GPT-4 technical report,” arXiv:2303.08774, pp. 1–100, 2023.
- “VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method,” Pattern Recognit. Lett., vol. 32, no. 1, pp. 56–68, 2011.
- Mingjun Sima, “Key frame extraction for human action videos in dynamic spatio-temporal slice clustering,” in CISAT, 2021, pp. 1–6.
- “Key frames extraction using graph modularity clustering for efficient video summarization,” in ICASSP, 2017, pp. 1502–1506.
- “Key frame extraction based on frame difference and cluster for person re-identification,” in Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, 2021, pp. 573–578.
- “Selection of key frames through the analysis and calculation of the absolute difference of histograms,” in ICALIP, 2018, pp. 423–429.
- “Shot based keyframe extraction using edge-lbp approach,” pp. 4537–4545, 2022.
- Naveen Kumar and Reddy, “Detection of shot boundaries and extraction of key frames for video retrieval,” pp. 11–17, 2020.
- “Transnet V2: an effective deep network architecture for fast shot transition detection,” arXiv:2008.04838, pp. 1–4, 2020.
- “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
- “Moving target detection algorithm based on sift feature matching,” in FAIML, 2022, pp. 196–199.
- “A facial expression recognition methond based on improved hog features and geometric features,” in IAEAC, 2019, pp. 1118–1122.
- “Improved the performance of the k-means cluster using the sum of squared error (sse) optimized by using the elbow method,” Journal of Physics: Conference Series, vol. 1361, pp. 12–15, 2019.
- “A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm,” pp. 1–16, 2021.
- “Cdbscan: Density clustering based on silhouette coefficient constraints,” in ICCEAI, 2022, pp. 600–605.
- “Color feature extraction of fingernail image based on hsv color space as early detection risk of diabetes mellitus,” in ICOMITEE, 2021, pp. 51–55.
- “Tvsum: Summarizing web videos using titles,” in CVPR, 2015, pp. 5179–5187.
- “Shot based keyframe extraction using edge-lbp approach,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4537–4545, 2022.
- “Key-frame extraction techniques: A review,” Recent Patents on Computer Science, vol. 11, no. 1, pp. 3–16, 2018.
- “Deep unsupervised key frame extraction for efficient video classification,” ACM Trans. Multim. Comput. Commun. Appl., vol. 19, no. 3, pp. 1–17, 2023.
- “A k-means clustering approach for extraction of keyframes in fast- moving videos,” in IJIPC, 2020, pp. 147–157.
- VideoSum: A Python Library for Surgical Video Summarization, pp. 1–2, 2023.
- Kailong Tan (1 paper)
- Yuxiang Zhou (33 papers)
- Qianchen Xia (2 papers)
- Rui Liu (320 papers)
- Yong Chen (299 papers)