Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data (2306.05535v2)
Abstract: Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech from past political debates in the USA. We then experimentally demonstrate that, in the case of multiple speakers, adding the audio modality yields sizable improvements over using the text modality alone; moreover, an audio-only model could outperform a text-only one for a single speaker. With the aim to enable future research, we make all our data and code publicly available at https://github.com/petar-iv/audio-checkworthiness-detection.
- “A survey of fake news: Fundamental theories, detection methods, and opportunities,” CSUR, vol. 53, no. 5, 2020.
- Giovanni Da San Martino et al., “A survey on computational propaganda detection,” in IJCAI, 2020, pp. 4826–4832, Survey track.
- Momchil Hardalov et al., “A survey on stance detection for mis- and disinformation identification,” in Findings of the Association for Computational Linguistics: NAACL, 2022, pp. 1259–1277.
- Preslav Nakov et al., “Overview of the CLEF–2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news,” in CLEF, 2021, pp. 264–291.
- Casper Hansen et al., “Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss,” CEUR, vol. 2380, 2019.
- Pepa Atanasova et al., “Overview of the CLEF-2019 CheckThat! lab: Automatic identification and verification of claims. task 1: Check-worthiness.,” CLEF (Working Notes), vol. 2380, 2019.
- Tomás Mikolov et al., “Efficient estimation of word representations in vector space,” in ICLR, 2013.
- Alberto Barron-Cedeno et al., “Overview of CheckThat! 2020: Automatic identification and verification of claims in social media,” CLEF, 2020.
- Juan R. Martinez-Rico et al., “NLP&IR@UNED at CheckThat! 2020: A preliminary approach for check-worthiness and claim retrieval tasks using neural networks and graphs,” in CLEF (Working Notes), 2020.
- Xinrui Zhou et al., “Fight for 4230 at CheckThat! 2021: Domain-specific preprocessing and pretrained model for ranking claims by check-worthiness,” in CEUR Workshop, 2021.
- Victor Sanh et al., “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” in EMC2-NIPS, 2019.
- Daniel Kopev et al., “Detecting deception in political debates using acoustic and textual features,” in ASRU, 2019, pp. 652–659.
- Yoan Dinkov et al., “Predicting the leading political ideology of YouTube channels using acoustic, textual, and metadata information,” in Interspeech, 2019, pp. 501–505.
- E. P. Fathima Bareeda et al., “Lie detection using speech processing techniques,” J. Phys., vol. 1921, no. 1, pp. 12–28, 2021.
- Sushma Venkatesh et al., “Robust algorithm for multimodal deception detection,” in MIPR, 2019, pp. 534–537.
- Fatma M Talaat, “Explainable enhanced recurrent neural network for lie detection using voice stress analysis,” MTA, pp. 1–23, 2023.
- Viresh Gupta et al., “Bag-of-Lies: A multimodal dataset for deception detection,” in CVPRW, 2019, pp. 83–90.
- Tim Sainburg, “timsainb/noisereduce: v1.0,” 2019.
- Tim Sainburg et al., “Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,” PLoS, vol. 16, no. 10, 2020.
- Jacob Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” in NAACL-HLT, 2019, pp. 4171–4186.
- Alexei Baevski et al., “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” in NeurIPS, 2020.
- Wei-Ning Hsu et al., “HuBERT: Self-supervised speech representation learning by masked prediction of hidden units,” TASLP, vol. 29, pp. 3451–3460, 2021.
- Alexei Baevski et al., “data2vec: A general framework for self-supervised learning in speech, vision and language,” in ICLR, 2022, vol. 162, pp. 1298–1312.
- Thomas Wolf et al., “Transformers: State-of-the-art natural language processing,” in EMNLP: System Demonstrations, 2020, pp. 38–45.
- Ashish Vaswani et al., “Attention is all you need,” in NeurIPS, 2017, pp. 5998–6008.
- “Decoupled weight decay regularization,” in ICLR, 2019.
- Pepa Gencheva et al., “A context-aware approach for detecting worth-checking claims in political debates,” in RANLP, 2017, pp. 267–276.
- “It takes nine to smell a rat: Neural multi-task learning for check-worthiness prediction,” in RANLP, 2019, pp. 1229–1239.
- Nitesh V Chawla et al., “SMOTE: Synthetic minority over-sampling technique,” JAIR, vol. 16, pp. 321–357, 2002.