Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

Published 2 May 2024 in cs.SD, cs.AI, and eess.AS | (2407.03110v1)

Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio & visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.