Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DPOD: Domain-Specific Prompt Tuning for Multimodal Fake News Detection (2311.16496v3)

Published 27 Nov 2023 in cs.LG

Abstract: The spread of fake news using out-of-context images has become widespread and is a relevant problem in this era of information overload. Such out-of-context fake news may arise across different domains like politics, sports, entertainment, etc. In practical scenarios, an inherent problem of imbalance exists among news articles from such widely varying domains, resulting in a few domains with abundant data, while the rest containing very limited data. Under such circumstances, it is imperative to develop methods which can work in such varying amounts of data setting. In this work, we explore whether out-of-domain data can help to improve out-of-context misinformation detection (termed here as multi-modal fake news detection) of a desired domain, to address this challenging problem. Towards this goal, we propose a novel framework termed DPOD (Domain-specific Prompt-tuning using Out-of-Domain data). First, to compute generalizable features, we modify the Vision-LLM, CLIP to extract features that helps to align the representations of the images and corresponding text captions of both the in-domain and out-of-domain data in a label-aware manner. Further, we propose a domain-specific prompt learning technique which leverages the training samples of all the available domains based on the extent they can be useful to the desired domain. Extensive experiments on a large-scale benchmark dataset, namely NewsCLIPpings demonstrate that the proposed framework achieves state of-the-art performance, significantly surpassing the existing approaches for this challenging task. Code will be released on acceptance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Open-domain, content-based, multi-modal fact-checking of out-of-context images via online resources. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, pages 14920–14929, 2022.
  2. Cosmos: Catching out-of-context image misuse using self-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 14084–14092, 2023.
  3. a-la-carte prompt tuning (apt): Combining distinct data via composable prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14984–14993, 2023.
  4. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pages 1597–1607, 2020.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Poda: Prompt-driven zero-shot domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18623–18633, 2023.
  7. Clip-adapter: Better vision-language models with feature adapters. CoRR, abs/2110.04544, 2021.
  8. Domain adaptation via prompt learning. arXiv preprint arXiv:2202.06687, 2022.
  9. Switchprompt: Learning domain-specific gated soft prompts for classification in low-resource domains. arXiv preprint arXiv:2302.06868, 2023.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  11. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
  12. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
  13. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  14. Visual news: Benchmark and challenges in news image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 6761–6771. ACL, 2021.
  15. Newsclippings: Automatic generation of out-of-context multimodal media. CoRR, abs/2104.05893, 2021.
  16. Self-supervised distilled learning for multi-modal misinformation identification. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, pages 2818–2827, 2023.
  17. Improving fake news detection of influential domain via domain- and instance-level transfer. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2834–2848, Gyeongju, Republic of Korea, 2022. International Committee on Computational Linguistics.
  18. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021. PMLR, 2021.
  19. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pages 557–565. AAAI Press, 2021.
  20. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  21. Ad-clip: Adapting domains in prompt space using clip. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4355–4364, 2023.
  22. Spotfake: A multi-modal framework for fake news detection. In Fifth IEEE International Conference on Multimedia Big Data, BigMM 2019, pages 39–47. IEEE, 2019.
  23. EANN: event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pages 849–857. ACM, 2018.
  24. Multimodal emergent fake news detection via meta neural process networks. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3708–3716. ACM, 2021.
  25. Learning domain invariant prompt for vision-language models. arXiv preprint arXiv:2212.04196, 2022.
  26. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022a.
  27. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
  28. SAFE: similarity-aware multi-modal fake news detection. In Pacific-Asia Conference on knowledge discovery and data mining, pages 354–367. Springer, 2020.
  29. Multimodal fake news detection via clip-guided learning. CoRR, abs/2205.14304, 2022c.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Debarshi Brahma (3 papers)
  2. Amartya Bhattacharya (3 papers)
  3. Suraj Nagaje Mahadev (1 paper)
  4. Anmol Asati (1 paper)
  5. Vikas Verma (20 papers)
  6. Soma Biswas (29 papers)

Summary

We haven't generated a summary for this paper yet.