A Holistic Indicator of Polarization to Measure Online Sexism (2404.02205v2)
Abstract: The online trend of the manosphere and feminist discourse on social networks requires a holistic measure of the level of sexism in an online community. This indicator is important for policymakers and moderators of online communities (e.g., subreddits) and computational social scientists, either to revise moderation strategies based on the degree of sexism or to match and compare the temporal sexism across different platforms and communities with real-time events and infer social scientific insights. In this paper, we build a model that can provide a comparable holistic indicator of toxicity targeted toward male and female identity and male and female individuals. Despite previous supervised NLP methods that require annotation of toxic comments at the target level (e.g. annotating comments that are specifically toxic toward women) to detect targeted toxic comments, our indicator uses supervised NLP to detect the presence of toxicity and unsupervised word embedding association test to detect the target automatically. We apply our model to gender discourse communities (e.g., r/TheRedPill, r/MGTOW, r/FemaleDatingStrategy) to detect the level of toxicity toward genders (i.e., sexism). Our results show that our framework accurately and consistently (93% correlation) measures the level of sexism in a community. We finally discuss how our framework can be generalized in the future to measure qualities other than toxicity (e.g. sentiment, humor) toward general-purpose targets and turn into an indicator of different sorts of polarizations.
- Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), 90–97. Santa Fe, New Mexico, USA: Association for Computational Linguistics.
- Differences in the Toxic Language of Cross-Platform Communities. In International AAAI Conference on Web and Social Media.
- Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences, 115(37): 9216–9221.
- The Pushshift Reddit Dataset. ICWSM, abs/2001.08435.
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, 4356–4364. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510838819.
- Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making. Policy & Internet, 7(2): 223–242.
- Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334): 183–186.
- LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), 1573–1581. Toronto, Canada: Association for Computational Linguistics.
- Misogynistic Men Online: How the Red Pill Helped Elect Trump. Signs: Journal of Women in Culture and Society, 44(3): 589–612.
- Dynel, M. 2020. Vigilante disparaging humour at r/IncelTears: Humour as critique of incel ideology. Language & Communication, 74: 1–14.
- Understanding Undesirable Word Embedding Associations. CoRR, abs/1908.06361: 1696–1705.
- Exploring Misogyny across the Manosphere in Reddit, 87–96. New York, NY, USA: Association for Computing Machinery. ISBN 9781450362023.
- Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community. ICWSM, abs/1603.08832: 112–120.
- Discovering and Categorising Language Biases in Reddit. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), volume 15, 140–151.
- Discovering and Interpreting Conceptual Biases in Online Communities. IEEE Transactions on Knowledge and Data Engineering (TKDE).
- Context: The missing piece in the machine learning lifecycle. In KDD CMI Workshop, volume 114, 368.
- AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, 556–565. New York, NY, USA: Association for Computing Machinery. ISBN 9798400701245.
- Ging, D. 2019. Alphas, Betas, and Incels: Theorizing the Masculinities of the Manosphere. Men and Masculinities, 22(4): 638–657.
- Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology, 74 6: 1464–80.
- An Expert Annotated Dataset for the Detection of Online Misogyny. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 1336–1350. Online: Association for Computational Linguistics.
- You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content. arXiv:2308.05596.
- Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan’s Politically Incorrect Forum and its Effects on the Web. In ICWSM, 92–101.
- The Evolution of the Manosphere across the Web. Proceedings of the International AAAI Conference on Web and Social Media, 15(1): 196–207.
- Profiling Depression in Neutral Reddit Posts. In GOOD Workshop KDD, volume 20, 2020.
- WILDS: A Benchmark of in-the-Wild Distribution Shifts. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5637–5664. PMLR.
- Using Platform Signals for Distinguishing Discourses: The Case of Men’s Rights and Men’s Liberation on Reddit. In ICWSM.
- Massanari, A. 2017. ”Come for the period comics. Stay for the cultural awareness”: reclaiming the troll identity through feminist humor on Reddit’s /r/TrollXChromosomes. Feminist Media Studies, 19: 1–19.
- Mountford, J. B. 2018. Topic Modeling The Red Pill. Social Sciences, 7(3).
- Addressing machine learning concept drift reveals declining vaccine sentiment during the COVID-19 pandemic. CoRR, abs/2012.02197.
- Myers, Q. 2020. WHAT’S BETTER THAN THIS? GUYS BEING (GOOD) DUDES ON REDDIT’S TROLLYCHROMOSOME.
- Newman, M. 2004. Power Laws, Pareto Distributions and Zipf’s Law. Contemporary Physics - CONTEMP PHYS, 46.
- ”How over is it?” Understanding the Incel Community on YouTube. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2): 1–25.
- AnonyMine: Mining anonymous social media posts using psycho-lingual and crowd-sourced dictionaries. In Proceedings of KDD.
- From Pick-Up Artists to Incels: A Data-Driven Sketch of the Manosphere. CoRR, abs/2001.07600.
- Samoshyn, A. 2020. Hate Speech and Offensive Language Dataset. https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset/metadata. Last Update: 2020-06-17.
- Automatic Sexism Detection with Multilingual Transformer Models AIT FHSTP@EXIST2021. In IberLEF@SEPLN.
- Subies, G. G. 2021. EXIST2021: Detecting Sexism with Transformers and Translation-Augmented Data. In IberLEF@SEPLN.
- What are the biases in my word embedding? CoRR, abs/1812.08769.
- Taylor, E. 2020. REDDIT’S FEMALE DATING STRATEGY OFFERS WOMEN ADVICE — AND A STRICT RULEBOOK FOR HOW TO ACT.
- Valkenburgh, S. P. V. 2021. Digesting the Red Pill: Masculinity and Neoliberalism in the Manosphere. Men and Masculinities, 24(1): 84–103.
- ‘The pussy ain’t worth it, bro’: assessing the discourse and structure of MGTOW. Information, Communication & Society, 23(6): 908–925.
- Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, 1391–1399. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee. ISBN 9781450349130.
- SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, 75–86.
- Vahid Ghafouri (4 papers)
- Jose Such (24 papers)
- Guillermo Suarez-Tangil (13 papers)