STFU NOOB! Predicting Crowdsourced Decisions on Toxic Behavior in Online Games (1404.5905v1)

Published 23 Apr 2014 in cs.SI, cs.CY, and physics.soc-ph

Abstract: One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts. In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

Authors (2)

Jeremy Blackburn (76 papers)
Haewoon Kwak (47 papers)

Citations (142)

View on Semantic Scholar

Summary

Analyzing "STFU NOOB! Predicting Crowdsourced Decisions on Toxic Behavior in Online Games"

This paper presents a structured investigation into toxic behavior in online games, with a focus on League of Legends (LoL), a prominent eSports title. The authors, Blackburn and Kwak, assess the effectiveness of a crowdsourcing platform known as the Tribunal, which players utilize to adjudicate reported cases of misbehavior in the game. The core contribution of this research lies in the development and validation of a supervised learning approach aimed at predicting the outcomes of such crowdsourced decisions, relying on a dataset encompassing over ten million user reports.

Research Context and Methodology

The context of the paper hinges on understanding toxic behavior—such as harassment and intentional disruption—in the competitive gaming environment of LoL. The Tribunal itself operates in two stages: initial reporting by players experiencing toxic behavior and a subsequent review by human jurors who decide on punitive measures based on aggregated reports.

Using this framework, the authors collect a substantial dataset comprised of 1.46 million individual cases involving toxic players across multiple regions. They extract 534 distinct features from these cases, analyzing in-game performance metrics, chat logs, and user reports to train their models. The models employ a Random Forest classifier designed to predict the Tribunal's decisions, especially focusing on cases with high levels of reviewer agreement.

Findings

The paper's outcomes provide several insights:

The classifier achieved approximately 80% accuracy in distinguishing between guilt and innocence in cases of toxic behavior, and 88% accuracy for overwhelming agreement on innocence.
Significant features influencing Tribunal decisions were identified, such as in-game performance, detailed characteristics of user reports, and linguistic nuances in chat logs.
The research highlighted that features like valence scores derived from chat logs—a parameter that gauges the emotional tone—carry substantial predictive power.
Importantly, this paper demonstrates the classifier's ability to generalize across different gaming environments and cultural regions, supporting the idea of universal patterns in online toxic behavior.

Implications and Future Directions

The practical implications of the research are significant, suggesting potential savings in human and temporal resources required for policing toxic behavior in gaming communities. The classifier's ability to preemptively identify clear-cut cases of innocence or guilt implies an opportunity for automated playlist management, potentially allowing developers to mitigate the effects of toxic behavior in real time.

Theoretically, the work advances the understanding of toxic behavior through quantifiable features, pushing towards a more systematic approach in defining and detecting online harassment. The authors indicate future explorations into real-time detection and adaptive user interfaces as promising avenues to improve player experiences and community health.

In conclusion, this paper exemplifies a methodological blend of crowdsourced human judgment and machine learning, paving the way for scalable, automated intervention strategies in the field of online games. The groundwork laid here presents fertile ground for further inquiry into cross-domain applications of such frameworks in other online environments and communities.

Related Papers

YouTube

Show All Videos