Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation (2403.14952v1)

Published 22 Mar 2024 in cs.CL and cs.AI

Abstract: The proliferation of online misinformation has posed significant threats to public interest. While numerous online users actively participate in the combat against misinformation, many of such responses can be characterized by the lack of politeness and supporting facts. As a solution, text generation approaches are proposed to automatically produce counter-misinformation responses. Nevertheless, existing methods are often trained end-to-end without leveraging external knowledge, resulting in subpar text quality and excessively repetitive responses. In this paper, we propose retrieval augmented response generation for online misinformation (RARG), which collects supporting evidence from scientific sources and generates counter-misinformation responses based on the evidences. In particular, our RARG consists of two stages: (1) evidence collection, where we design a retrieval pipeline to retrieve and rerank evidence documents using a database comprising over 1M academic articles; (2) response generation, in which we align LLMs to generate evidence-based responses via reinforcement learning from human feedback (RLHF). We propose a reward function to maximize the utilization of the retrieved evidence while maintaining the quality of the generated text, which yields polite and factual responses that clearly refutes misinformation. To demonstrate the effectiveness of our method, we study the case of COVID-19 and perform extensive experiments with both in- and cross-domain datasets, where RARG consistently outperforms baselines by generating high-quality counter-misinformation responses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
  2. Philip Ball and Amy Maxmen. 2020. The epic battle against coronavirus misinformation and conspiracy theories. Nature, 581(7809):371–375.
  3. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  4. Wisdom of two crowds: Misinformation moderation on reddit and how to improve this process—a case study of covid-19. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1):1–33.
  5. E-bart: Jointly predicting and explaining truthfulness. In TTO, pages 18–27.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  7. Debunking: A meta-analysis of the psychological efficacy of messages countering misinformation. Psychological science, 28(11):1531–1546.
  8. Canyu Chen and Kai Shu. 2023a. Can llm-generated misinformation be detected? arXiv preprint arXiv:2309.13788.
  9. Canyu Chen and Kai Shu. 2023b. Combating misinformation in the age of llms: Opportunities and challenges. arXiv preprint arXiv:2311.05656.
  10. Combating health misinformation in social media: Characterization, detection, intervention, and open issues. arXiv preprint arXiv:2211.05289.
  11. Litcovid: an open database of covid-19 literature. Nucleic acids research, 49(D1):D1534–D1540.
  12. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  13. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  14. Llm. int8 (): 8-bit matrix multiplication for transformers at scale. arXiv preprint arXiv:2208.07339.
  15. Chiara Patricia Drolsbach and Nicolas Pröllochs. 2023. Diffusion of community fact-checked misinformation on twitter. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2):1–22.
  16. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  17. Anti-vax: a novel twitter dataset for covid-19 vaccine misinformation detection. Public health, 203:23–30.
  18. Reinforcement learning-based counter-misinformation response generation: A case study of covid-19 vaccine misinformation. In Proceedings of the ACM Web Conference 2023, pages 2698–2709.
  19. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  20. Compare to the knowledge: Graph neural fake news detection with external knowledge. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 754–763, Online. Association for Computational Linguistics.
  21. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118.
  22. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online. Association for Computational Linguistics.
  23. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  24. Fake news detection via knowledgeable prompt learning. Information Processing & Management, 59(5):103029.
  25. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  26. Knowledge graph informed fake news classification via heterogeneous representation ensembles. Neurocomputing.
  27. Hc-covid: A hierarchical crowdsource knowledge graph approach to explainable covid-19 misinformation detection. Proceedings of the ACM on Human-Computer Interaction, 6(GROUP):1–25.
  28. Fakesens: A social sensing approach to covid-19 misinformation detection on social media. In 2021 17th International Conference on Distributed Computing in Sensor Systems (DCOSS), pages 140–147. IEEE.
  29. Crowd, expert & ai: A human-ai interactive approach towards natural language explanation based covid-19 misinformation detection. In Proc. Int. Joint Conf. Artif. Intell.(IJCAI), pages 5087–5093.
  30. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  31. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  32. Efficient and timely misinformation blocking under varying cost constraints. Online Social Networks and Media, 2:19–31.
  33. Interpretable multimodal misinformation detection with logic reasoning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9781–9796, Toronto, Canada. Association for Computational Linguistics.
  34. Teller: A trustworthy framework for explainable, generalizable and controllable fake news detection. arXiv preprint arXiv:2402.07776.
  35. The meaning of misinformation and those who correct it: An extension of relational dialectics theory. Journal of Social and Personal Relationships, 39(5):1256–1276.
  36. Human-in-the-loop evaluation for early misinformation detection: A case study of COVID-19 treatments. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15817–15835, Toronto, Canada. Association for Computational Linguistics.
  37. The role of the crowd in countering misinformation: A case study of the covid-19 infodemic. In 2020 IEEE international Conference on big data (big data), pages 748–757. IEEE.
  38. Cross-platform multimodal misinformation: Taxonomy, characteristics and detection for textual posts and videos. In Proceedings of the International AAAI Conference on Web and Social Media, volume 16, pages 651–662.
  39. R OpenAI. 2023. Gpt-4 technical report. arXiv, pages 2303–08774.
  40. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  41. Fighting an infodemic: Covid-19 fake news dataset. In International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, pages 21–29. Springer.
  42. The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  43. Godel: Large-scale pre-training for goal-directed dialog. arXiv preprint arXiv:2206.11309.
  44. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
  45. Online misinformation is linked to early covid-19 vaccination hesitancy and refusal. Scientific reports, 12(1):5966.
  46. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  47. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
  48. PAIR: Leveraging passage-centric similarity relation for improving dense passage retrieval. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2173–2183, Online. Association for Computational Linguistics.
  49. Okapi at trec-3. Nist Special Publication Sp, 109:109.
  50. Susceptibility to misinformation about covid-19 around the world. Royal Society open science, 7(10):201199.
  51. A multi-model intelligent approach for rumor detection in social networks. In 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS), pages 1–5. IEEE.
  52. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  53. If you have a reliable source, say something: effects of correction comments on covid-19 misinformation. In Proceedings of the international AAAI conference on web and social media, volume 16, pages 896–907.
  54. A privacy-aware distributed knowledge graph approach to qois-driven covid-19 misinformation detection. In 2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS), pages 1–10. IEEE.
  55. A multimodal misinformation detector for covid-19 short videos on tiktok. In 2021 IEEE International Conference on Big Data (Big Data), pages 899–908. IEEE.
  56. A duo-generative approach to explainable multimodal covid-19 misinformation detection. In Proceedings of the ACM Web Conference 2022, pages 3623–3631.
  57. A knowledge-driven domain adaptive approach to early misinformation detection in an emergent health domain on social media. In 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 34–41. IEEE.
  58. Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach.
  59. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  60. Cross-domain fake news detection on social media: A context-aware adversarial approach. In Frontiers in Fake Media Generation and Detection, pages 215–232. Springer.
  61. Disinformation, misinformation, and fake news in social media. Springer.
  62. Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing. IConference 2014 proceedings.
  63. Head-to-tail: How knowledgeable are large language models (llm)? aka will llms replace knowledge graphs? arXiv preprint arXiv:2308.10168.
  64. Bruno Tafur and Advait Sarkar. 2023. User perceptions of automatic fake news detection: Can algorithms fight online misinformation? arXiv preprint arXiv:2304.07926.
  65. Yuko Tanaka and Rumi Hirayama. 2019. Exposure to countering messages online: alleviating or strengthening false belief? Cyberpsychology, Behavior, and Social Networking, 22(11):742–746.
  66. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  67. Jeyasushma Veeriah. 2021. Young adults’ ability to detect fake news and their new media literacy level in the wake of the covid-19 pandemic. Journal of Content, Community and Communication, 13(7):372–383.
  68. Nguyen Vo and Kyumin Lee. 2019. Learning from fact-checkers: Analysis and generation of fact-checking language. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335–344.
  69. Nguyen Vo and Kyumin Lee. 2020. Standing on the shoulders of guardians: Novel methodologies to combat fake news. Disinformation, Misinformation, and Fake News in Social Media: Emerging Research Challenges and Opportunities, pages 183–210.
  70. Emily K Vraga and Leticia Bode. 2021. Addressing covid-19 misinformation on social media preemptively and responsively. Emerging infectious diseases, 27(2):396.
  71. Dell: Generating reactions and explanations for llm-based misinformation detection. arXiv preprint arXiv:2402.10426.
  72. Check-COVID: Fact-checking COVID-19 news claims with scientific evidence. In Findings of the Association for Computational Linguistics: ACL 2023, pages 14114–14127, Toronto, Canada. Association for Computational Linguistics.
  73. Text embeddings by weakly-supervised contrastive pre-training. arXiv e-prints, pages arXiv–2212.
  74. Improving text embeddings with large language models. arXiv preprint arXiv:2401.00368.
  75. Cord-19: The covid-19 open research dataset. ArXiv.
  76. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  77. Bias mitigation for evidence-aware fake news detection by causal intervention. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2308–2313.
  78. Adversarial contrastive learning for evidence-aware fake news detection with graph neural networks. arXiv preprint arXiv:2210.05498.
  79. Cross-document misinformation detection based on event graph reasoning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 543–558, Seattle, United States. Association for Computational Linguistics.
  80. Evidence-aware fake news detection with graph neural networks. In Proceedings of the ACM Web Conference 2022, pages 2501–2510.
  81. A coarse-to-fine cascaded evidence-distillation neural network for explainable fake news detection. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2608–2621, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  82. Contrastive domain adaptation for early misinformation detection: A case study on covid-19. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 2423–2433.
  83. MetaAdapt: Domain adaptive few-shot misinformation detection via meta learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5223–5239, Toronto, Canada. Association for Computational Linguistics.
  84. Unsupervised domain adaptation for covid-19 information service with contrastive adversarial domain mixup. pages 159–162.
  85. Unsupervised domain adaptation via contrastive adversarial domain mixup: A case study on covid-19. IEEE Transactions on Emerging Topics in Computing.
  86. DIALOGPT : Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278, Online. Association for Computational Linguistics.
  87. PANACEA: An automated misinformation detection system on COVID-19. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 67–74, Dubrovnik, Croatia. Association for Computational Linguistics.
  88. Multimodal fake news detection via clip-guided learning. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2825–2830. IEEE.
  89. Robust rumor blocking problem with uncertain rumor sources in social networks. World wide web, 24:229–247.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhenrui Yue (24 papers)
  2. Huimin Zeng (25 papers)
  3. Yimeng Lu (4 papers)
  4. Lanyu Shang (14 papers)
  5. Yang Zhang (1129 papers)
  6. Dong Wang (628 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com