AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild (2405.11697v2)
Abstract: The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.
- Midjourney user prompts & generated images (250k). https://www.kaggle.com/datasets/succinctlyai/midjourney-texttoimage. Accessed: 2024-03-20.
- State of the fact-checkers report 2023. Technical report, International Fact-Checking Network, 2023.
- Trends in the diffusion of misinformation on social media. Research & Politics, 6(2):2053168019848554, 2019.
- Catching out-of-context misinformation with self-supervised learning. CoRR, abs/2101.06278, 2021.
- Toward a theory of visual argument. Argumentation and Advocacy, 33(1):1–10, 1996.
- Types, sources, and claims of COVID-19 misinformation. PhD thesis, University of Oxford, 2020.
- Determining image origin and integrity using sensor noise. IEEE Transactions on information forensics and security, 3(1):74–90, 2008.
- Twigma: A dataset of ai-generated images with metadata from twitter, 2023.
- Deep fakes: A looming challenge for privacy, democracy, and national security. Calif. L. Rev., 107:1753, 2019.
- How spammers and scammers leverage ai-generated images on facebook for audience growth. arXiv preprint arXiv:2403.12838, 2024.
- Rebroadcast attacks: Defenses, reattacks, and redefenses. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 942–946. IEEE, 2018.
- Fake news on twitter during the 2016 us presidential election. Science, 363(6425):374–378, 2019.
- A picture paints a thousand lies? the effects and mechanisms of multimodal disinformation and rebuttals disseminated via social media. Political communication, 37(2):281–301, 2020.
- The ps-battles dataset - an image collection for image manipulation detection. CoRR, abs/1804.04866, 2018.
- Ipsos/UNESCO. Survey on the impact of online disinformation and hate speech, 2023.
- Abstract images have different levels of retrievability per reverse image search engine, 2022.
- KFF. Kff misinformation poll snapshot: Public views misinformation as a major problem, feels uncertain about accuracy of information on current events, 2023.
- Visual user-generated content verification in journalism: An overview. IEEE Access, 11:6748–6769, 2023.
- Is a picture worth a thousand words? an empirical study of image content and social media engagement. Journal of Marketing Research, 57(1):1–19, 2020.
- Multi-modal semantic inconsistency detection in social media news posts. In International Conference on Multimedia Modeling, pages 331–343. Springer, 2022.
- Mary Meeker. Internet trends 2016, 2016.
- Meta. Facebook widely viewed content report: Q3 2023. Technical report, November 2023. Accessed: April 1, 2024. Downloaded from archive: https://transparency.fb.com/data/widely-viewed-content-report?gk_enable=stc_nov_2023#prior-reports.
- r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection, 2020.
- Misinformed by images: How images influence perceptions of truth and what can be done about it. Current Opinion in Psychology, page 101778, 2023.
- Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset, 2022.
- Deepfakes and cheap fakes. 2019.
- Pearson Institute/AP-NORC. The american public views the spread of misinformation as a major problem, 2021.
- A short guide to the history of ‘fake news’ and disinformation. International Center for Journalists, 7(2018):2018–07, 2018.
- A dataset of fact-checked images shared on whatsapp during the brazilian and indian elections. In Proceedings of the international AAAI conference on web and social media, volume 14, pages 903–908, 2020.
- Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- FEVER: a large-scale dataset for fact extraction and verification. CoRR, abs/1803.05355, 2018.
- Luisa Verdoliva. Media forensics and deepfakes: An overview. IEEE Journal of Selected Topics in Signal Processing, 14(5):910–932, August 2020.
- William Yang Wang. "liar, liar pants on fire": A new benchmark dataset for fake news detection. CoRR, abs/1705.00648, 2017.
- Understanding the use of fauxtography on social media. In Proceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 776–786, 2021.
- Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models, 2023.
- Information disorder: Toward an interdisciplinary framework for research and policymaking, volume 27. Council of Europe Strasbourg, 2017.
- Visual disinformation in a digital age: A literature synthesis and research agenda. new media & society, 25(12):3696–3713, 2023.
- Visual misinformation on facebook. Journal of Communication, 73(4):316–328, 2023.
- Genimage: A million-scale benchmark for detecting ai-generated image, 2023.
- Nicholas Dufour (3 papers)
- Arkanath Pathak (5 papers)
- Pouya Samangouei (9 papers)
- Nikki Hariri (1 paper)
- Shashi Deshetti (1 paper)
- Andrew Dudfield (1 paper)
- Christopher Guess (1 paper)
- Pablo Hernández Escayola (1 paper)
- Bobby Tran (1 paper)
- Mevan Babakar (2 papers)
- Christoph Bregler (7 papers)