Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media (1610.09786v1)

Published 31 Oct 2016 in cs.SI and cs.HC

Abstract: Most of the online news media outlets rely heavily on the revenues generated from the clicks made by their readers, and due to the presence of numerous such outlets, they need to compete with each other for reader attention. To attract the readers to click on an article and subsequently visit the media site, the outlets often come up with catchy headlines accompanying the article links, which lure the readers to click on the link. Such headlines are known as Clickbaits. While these baits may trick the readers into clicking, in the long run, clickbaits usually don't live up to the expectation of the readers, and leave them disappointed. In this work, we attempt to automatically detect clickbaits and then build a browser extension which warns the readers of different media sites about the possibility of being baited by such headlines. The extension also offers each reader an option to block clickbaits she doesn't want to see. Then, using such reader choices, the extension automatically blocks similar clickbaits during her future visits. We run extensive offline and online experiments across multiple media sites and find that the proposed clickbait detection and the personalized blocking approaches perform very well achieving 93% accuracy in detecting and 89% accuracy in blocking clickbaits.

Citations (350)

View on Semantic Scholar

Summary

The paper demonstrates a robust machine learning classifier achieving 93% accuracy in detecting clickbait headlines and 89% accuracy in personalized blocking decisions.
The methodology leverages linguistic features such as sentence structure, hyperbolic words, and syntactic n-grams to distinguish clickbait from non-clickbait.
The study introduces a browser extension offering user-specific clickbait blocking, ultimately promoting higher quality news content and enhanced reader satisfaction.

Analyzing Clickbait Detection and Prevention in Online News Media

The paper "Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media," by Chakraborty et al., presents a detailed examination of the phenomenon of clickbait in online news platforms. The research identifies a critical flaw in current online journalism, where the competition for reader attention leads many media outlets to employ sensationalized headlines, known as clickbaits. These clickbaits exploit users' curiosity by providing ambiguous or intriguing article titles to increase clicks, which may not align with the content quality or relevance, leaving readers dissatisfied upon engaging with the article.

Research Objectives and Methodology

The primary aim of this paper is to leverage machine learning methods to automatically detect and subsequently prevent clickbaits. The authors develop a browser extension, "Stop Clickbait," which alerts readers about the presence of potential clickbait headlines and allows users to block unwanted content. The paper emphasizes the dual approach of using both detection and personalized blocking techniques. The detection mechanism involves building a classifier that distinguishes between clickbait and non-clickbait headlines with a high accuracy of 93%. Following detection, the extension uses personalized classifiers for individual users to block clickbaits based on their click-block history, achieving 89% accuracy in personalized blocking decisions.

Dataset and Linguistic Analysis

A comprehensive dataset was curated, consisting of 7,500 clickbait and 7,500 non-clickbait headlines from various identifiable sources such as BuzzFeed for clickbait and Wikinews for non-clickbait. A rigorous linguistic analysis reveals that clickbait headlines differ significantly in terms of length, structure, word usage, sentiment, and syntactic dependencies compared to non-clickbait headlines. Clickbait headlines are found to be longer, make use of hyperbolic words, and exhibit prevalent curiosity-inducing patterns, exploiting semantic nuances to capture user interest. These insights were used to derive various features for the classifier, including sentence structure, stop words, hyperbolic words, POS tags, and syntactic n-grams.

Classifier Performance and Personalization

For implementing the classifier, the authors incorporate multiple features, resulting in models with varying degrees of success. Support Vector Machines (SVM) with an RBF kernel yielded the best performance, indicating their approach's robustness in detecting clickbait patterns across a diverse range of headlines. Furthermore, the paper emphasizes the importance of personalization in blocking clickbaits since user preferences vary significantly. This is addressed through a hybrid approach that integrates topical similarities and linguistic patterns to customize the blocking strategy, respecting user-specific interactions with clickbait content.

Practical Implications and Future Directions

This research makes important contributions by providing practical tools to both consumers and media outlets interested in maintaining content quality and trust with audiences. The novel approach of personalized content filtering potentially reduces user exposure to misleading headlines, promoting a healthier consumption environment. However, enhancing the classifier and expanding its application to non-English languages could further increase its utility. Future work might include improving personalization algorithms based on evolving user interaction data and exploring the broader impacts of reduced clickbait visibility on media business models.

In conclusion, the paper presents a methodical solution aimed at mitigating the rise of clickbaits in online media, demonstrating that machine learning and personalization can be effectively employed to achieve significant accuracy in both detection and prevention of clickbait content. This work is a stepping stone towards more reliable online news environments and fosters a potential shift in how media outlets might approach reader engagement.

PDF Markdown