StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos (2505.18903v1)

Published 24 May 2025 in cs.CL

Abstract: Aiming towards improving current computational models of humor detection, we propose a new multimodal dataset of stand-up comedies, in seven languages: English, French, Spanish, Italian, Portuguese, Hungarian and Czech. Our dataset of more than 330 hours, is at the time of writing the biggest available for this type of task, and the most diverse. The whole dataset is automatically annotated in laughter (from the audience), and the subpart left for model validation is manually annotated. Contrary to contemporary approaches, we do not frame the task of humor detection as a binary sequence classification, but as word-level sequence labeling, in order to take into account all the context of the sequence and to capture the continuous joke tagging mechanism typically occurring in natural conversations. As par with unimodal baselines results, we propose a method for e propose a method to enhance the automatic laughter detection based on Audio Speech Recognition errors. Our code and data are available online: https://tinyurl.com/EMNLPHumourStandUpPublic

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

StandUp4AI: Advancements in Automatic Humor Detection

The paper "StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos," presents a novel approach and dataset aimed at improving humor detection in computational models. This work stands out due to its focus on stand-up comedy performances, spanning seven languages: English, French, Spanish, Italian, Portuguese, Hungarian, and Czech. The dataset constitutes over 330 hours of video footage, which represents one of the largest collections available for humor detection tasks.

Dataset and Approach

The authors curated the StandUp4AI dataset by collecting stand-up comedy performances from popular online platforms, focusing on videos with a single comedian and avoiding short-form content like Youtube Shorts. Automatic laughter detection, inspired by existing benchmarks, was employed using state-of-the-art techniques such as the model from Omine et al. (2024). The dataset includes automatically refined audience laughter annotations, complemented by manual verification for model validation.

The research introduces a shift from conventional binary classification to sequence labeling for humor detection. This method labels each word as either leading to laughter or not, thereby capturing the continuous nature of humor and audience reactions peculiar to stand-up comedy.

Methodological Contributions

The paper elaborates on an innovative method for improving the accuracy of laughter detection in video transcripts through the integration of ASR outputs from Whisper and WhisperX systems. Enhanced timestamp verification of laughter events led to improved candidate selection for laughter, which human annotators later evaluated in a subset of videos. The manuscript delineates the methodology for refining laughter detection and establishing a standardized framework for its automated validation using acoustic features.

Additionally, the authors have implemented unimodal sequence labeling approaches using large pre-trained models fine-tuned for this specific laughter prediction task. These models set a baseline for future humor detection research by emphasizing the importance of multilingual training on diverse data.

Results and Implications

The results indicate a substantial advantage in applying a multilingual approach, as opposed to monolingual models, for predicting humor. The enhanced dataset with refined laughter labels improved model predictions, demonstrating the effectiveness of the proposed multimodal techniques. Quantitatively, multilingual models showed a higher F1-score of 42.4 compared to 39.4 achieved by monolingual configurations.

With these findings, the research propels the computational modeling of humor toward a direction that acknowledges linguistic diversity and multimodal contexts. The implications of this work extend beyond improved interactive systems and chatbots, potentially enriching AI's capacity for cultural nuance and enhancing user experience in conversational agents.

Future Directions

Looking ahead, the integration of verbal and non-verbal cues in automatic humor detection is recommended. Further exploration into the cultural variability of humor responses could deepen the insights into humor mechanics across languages and sociocultural contexts. The dataset also invites potential enhancements through multimodal learning, leveraging visible and acoustic features for deeper analysis.

By offering both data and code publicly, the paper encourages continued collaboration and innovation in developing humor-aware artificial intelligence systems. StandUp4AI is poised to be a reference point in humor detection studies, providing groundwork for the exploration of context-aware AI interactions.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos (2505.18903v1)

Collections

Summary

StandUp4AI: Advancements in Automatic Humor Detection

Dataset and Approach

Methodological Contributions

Results and Implications

Future Directions

Follow-up Questions

Authors (5)

Don't miss out on important new AI/ML research

StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos (2505.18903v1)

Collections

Summary

StandUp4AI: Advancements in Automatic Humor Detection

Dataset and Approach

Methodological Contributions

Results and Implications

Future Directions

Follow-up Questions

Related Papers

Authors (5)

Don't miss out on important new AI/ML research