Papers
Topics
Authors
Recent
2000 character limit reached

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community (2503.21679v2)

Published 27 Mar 2025 in cs.CL and cs.CY

Abstract: This paper introduces JiraiBench, the first bilingual benchmark for evaluating LLMs' effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational "Jirai" (landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we present a comprehensive evaluation framework incorporating both linguistic and cultural dimensions. Our dataset comprises 10,419 Chinese posts and 5,000 Japanese posts with multidimensional annotation along three behavioral categories, achieving substantial inter-annotator agreement. Experimental evaluations across four state-of-the-art models reveal significant performance variations based on instructional language, with Japanese prompts unexpectedly outperforming Chinese prompts when processing Chinese content. This emergent cross-cultural transfer suggests that cultural proximity can sometimes outweigh linguistic similarity in detection tasks. Cross-lingual transfer experiments with fine-tuned models further demonstrate the potential for knowledge transfer between these language systems without explicit target language training. These findings highlight the need for culturally-informed approaches to multilingual content moderation and provide empirical evidence for the importance of cultural context in developing more effective detection systems for vulnerable online communities.

Summary

An Overview of JiraiBench: A Bilingual Benchmark for Detecting Self-Destructive Behavior in Social Media

The paper "JiraiBench: A Bilingual Benchmark for Evaluating LLMs' Detection of Human Self-Destructive Behavior Content in Jirai Community" presents a novel approach to addressing the nuanced challenge of detecting self-destructive behaviors within the bilingual contexts of Chinese and Japanese social media platforms. The researchers introduce JiraiBench, a meticulous dataset designed to enable LLMs to identify harmful content relating to drug overdose (OD), eating disorders (ED), and self-harm (SH) as expressed in social media posts from the "Jirai" communities.

Methodology and Dataset Construction

The dataset comprises 15,419 annotated social media posts, with 10,419 drawn from Chinese Sina Weibo and 5,000 from Japanese Twitter, encompassing dialogues that manifest the specific self-destructive behaviors under investigation. Through careful keyword searches designed by domain experts, the team curated posts indicative of concern. Annotation followed a scale of 0-2, distinguishing between self-destructive and non-destructive content, as well as further identifying discourse concerning personal vs. third-party experiences.

This annotation strategy embodies both linguistic and cultural dimensions, capturing the complex sociocultural lexicon intrinsic to Jirai communities—a transnational phenomenon characterized by distinct communication patterns evolved during the early 2020s, notably under pandemic-induced constraints.

Experimental Framework

The experimental design evaluates four advanced LLMs—Llama-3.1 8B, Qwen-2.5 7B, DeepSeek-v3, and the finetuned JiraiLLM-Qwen—across various configurations. One significant finding is the superior performance of Japanese instruction prompts over Chinese ones when these frameworks process content originating from Chinese platforms. This cross-cultural transfer phenomenon suggests that cultural proximity can outweigh linguistic similarity, unlocking pathways for effective bilingual content moderation.

Notably, the cross-lingual transfer experiment illustrates the ability of finely-tuned models on a source language to exhibit substantial performance gains across the target language systems without explicit training data. This highlights the potential for LLMs to generalize across East Asian language contexts where cultural narratives and shared linguistic artifacts may enhance detection efficacy.

Implications and Observations

The research underscores the necessity of integrating cultural awareness into multilingual AI deployments. This culturally-informed approach holds the promise of refining content moderation systems pertinent to mental health interventions, ensuring sensitive content is identified more accurately across global platforms. Moreover, the documented unexpected cross-cultural transfer effect proposes a methodological pivot toward embedding cultural proximity considerations in model training, suggesting that activating relevant cultural schemas through instruction languages might surpass conventional native language strategies.

Future exploratory pathways include expanding the benchmark to encompass other languages and cultural contexts, accommodating a broader spectrum of self-destructive behaviors, and refining annotation protocols to address the intricacies of the evolving discourse within vulnerable communities.

Conclusion

JiraiBench provides a critical foundational step towards more effective AI implementation in multilingual and culturally complex environments. Its establishment addresses significant limitations prevalent in current systems' abilities to identify and mitigate harmful content. As AI technologies continue to interplay with human mental health challenges globally, frameworks such as JiraiBench represent essential tools in advancing ethically responsive solutions tailored to diverse linguistic and cultural landscapes.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 3 tweets with 42 likes about this paper.