Misinformation Resilient Search Rankings with Webgraph-based Interventions (2404.08869v1)
Abstract: The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact-check agnostic), targeted (increase the cost of adversarial behavior), and scalability (works at webscale). We refine our methods on small-scale webdata as a testbed and then generalize the interventions to a large-scale webgraph containing 93.9M domains and 1.6B edges. We demonstrate that our methods penalize unreliable domains far more than reliable domains in both settings and we explore multiple avenues to mitigate unintended effects on both the small-scale and large-scale webgraph experiments. These results indicate the potential of our approach to reduce the spread of misinformation and foster a more reliable online information ecosystem. This research contributes to the development of targeted strategies to enhance the trustworthiness and quality of search engine results, ultimately benefiting users and the broader digital community.
- 2024. Ahrefs: SEO tools & resources to grow your search traffic. (2024). https://ahrefs.com/
- 2024. SimilarWeb. (2024). https://www.similarweb.com/corp/ourdata/
- Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of economic perspectives 31, 2 (2017), 211–36.
- AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. https://arxiv.org/abs/1810.01943
- Kenneth Benoit. 2011. Linear regression models with logarithmic transformations. London School of Economics, London 22, 1 (2011), 23–36.
- Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines. (2024).
- P. Boldi and S. Vigna. 2004. The webgraph framework I: compression techniques. In Proceedings of the 13th international conference on World Wide Web. ACM, New York NY USA, 595–602. https://doi.org/10.1145/988672.988752
- Samantha Bradshaw. 2019. Disinformation optimised: gaming search engine algorithms to amplify junk news. Internet policy review 8, 4 (2019), 1–24.
- How social learning amplifies moral outrage expression in online social networks. Science Advances 7, 33 (2021), eabe5641.
- Detection and Discovery of Misinformation Domains using Attributed Webgraphs. In Upcoming at ICWSM 2024: The International AAAI Conference on Web and Social Media 2024. https://arxiv.org/abs/2401.02379
- John Cook. 2022. Understanding and countering misinformation about climate change. Research Anthology on Environmental and Societal Impacts of Climate Change (2022), 1633–1658.
- Common Crawl. 2022. The Common Crawl Foundation. https://commoncrawl.org/(visited2022-03-12)
- Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259–268.
- Dipayan Ghosh and Ben Scott. 2018. Digital deceit: technologies behind precision propaganda on the internet. (2018).
- Michael Golebiewski and Danah Boyd. 2019. Data voids: Where missing data can easily be exploited. (2019).
- Asymmetric ideological segregation in exposure to political news on Facebook. Science 381, 6656 (2023), 392–398. https://doi.org/10.1126/science.ade7138 arXiv:https://www.science.org/doi/pdf/10.1126/science.ade7138
- Charles AE Goodhart. 1984. Problems of monetary management: the UK experience. Springer.
- Danny Goodwin. 2023. Yandex ‘leak’ reveals 1,922 search ranking factors. (2023). https://searchengineland.com/yandex-search-ranking-factors-leak-392323
- Fake news on Twitter during the 2016 US presidential election. Science 363, 6425 (2019), 374–378.
- Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. 576–587.
- Rebecca Hersher. 2017. What Happened When Dylann Roof Asked Google For Information About Race? (2017).
- Quantitative and qualitative analysis of linking patterns of mainstream and partisan online news media in Central Europe. Online Information Review (2021).
- Glen Jeh and Jennifer Widom. 2003a. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web. 271–279.
- Glen Jeh and Jennifer Widom. 2003b. Scaling personalized web search. (2003), 271–279.
- Scalable Anti-TrustRank with qualified site-level seeds for link-based web spam detection. In Companion Proceedings of the Web Conference 2020. 593–602.
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- Jon M Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46, 5 (1999), 604–632.
- The influence of search engine optimization on Google’s results:: A multi-dimensional approach for detecting SEO. 12–20. https://doi.org/10.1145/3447535.3462479
- High level of agreement across different news domain quality ratings. (2022).
- Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nature human behaviour 5, 3 (2021), 337–348.
- A story of (non) compliance, bias, and conspiracies: How Google and Yandex represented Smart Voting during the 2021 parliamentary elections in Russia. Harvard Kennedy School Misinformation Review 3, 2 (2022), 1–16.
- Ross A. Malaga. 2010. Chapter 1 - Search Engine Optimization—Black and White Hat Approaches. In Advances in Computers. Advances in Computers: Improving the Web, Vol. 78. Elsevier, 1–39. https://doi.org/10.1016/S0065-2458(10)78001-3
- T.J. McCue. 2018. SEO Industry Approaching $80 Billion But All You Want Is More Web Traffic.
- Vaccine hesitancy and exposure to misinformation: a survey analysis. Journal of general internal medicine (2022), 1–9.
- Like-minded sources on Facebook are prevalent but not polarizing. Nature 620, 7972 (2023), 137–144.
- Misinformation in action: Fake news exposure is linked to lower trust in media, higher trust in government when your side is in power. Harvard Kennedy School Misinformation Review (2020).
- The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
- A link-based approach to detect media bias in news websites. In Companion Proceedings of The 2019 World Wide Web Conference. 742–745.
- The search term ‘suicide’ is being used to lead web browsers to online casinos. Behaviour & Information Technology 0, 0 (2024), 1–12. https://doi.org/10.1080/0144929X.2023.2298307 Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/0144929X.2023.2298307.
- Fairness in rankings and recommendations: an overview. The VLDB Journal (2022), 1–28.
- Identifying Search Directives on Social Media. Journal of Online Trust and Safety 2, 1 (2023).
- Mutual Hyperlinking Among Misinformation Peddlers. http://arxiv.org/abs/2104.11694 arXiv:2104.11694 [cs].
- Combating fake news: A survey on identification and mitigation techniques. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 3 (2019), 1–42.
- Fairness-aware pagerank. In Proceedings of the Web Conference 2021. 3815–3826.
- Where the earth is flat and 9/11 is an inside job: A comparative algorithm audit of conspiratorial information in web search results. Telematics and informatics 72 (2022), 101860.
- Herman Wasserman and Dani Madrid-Morales. 2019. An exploratory study of “fake news” and media trust in Kenya, Nigeria and South Africa. African Journalism Studies 40, 1 (2019), 107–123.
- Mark Webster. 2022. Are Keyword Tools Traffic Estimates Accurate? (Case Study). https://www.authorityhacker.com/traffic-estimates-accuracy/
- Fast asynchronous anti-trustrank for web spam detection. WSDM Worskshop MIS2 (2018), 1–4.
- Evan M Williams and Kathleen M Carley. 2023. Search engine manipulation to spread pro-Kremlin propaganda. Harvard Kennedy School Misinformation Review (2023).
- Casino royale: a deep exploration of illegal online gambling. In Proceedings of the 35th Annual Computer Security Applications Conference. ACM, San Juan Puerto Rico USA, 500–513. https://doi.org/10.1145/3359789.3359817
- Dave Van Zandt. 2024. Media Bias Fact Check: A Comprehensive Media Bias Resource. (2024). https://mediabiasfactcheck.com/methodology