The paper "JiraiBench: A Bilingual Benchmark for Evaluating LLMs' Detection of Human Self-Destructive Behavior Content in Jirai Community" presents a novel approach to addressing the nuanced challenge of detecting self-destructive behaviors within the bilingual contexts of Chinese and Japanese social media platforms. The researchers introduce JiraiBench, a meticulous dataset designed to enable LLMs to identify harmful content relating to drug overdose (OD), eating disorders (ED), and self-harm (SH) as expressed in social media posts from the "Jirai" communities.
Methodology and Dataset Construction
The dataset comprises 15,419 annotated social media posts, with 10,419 drawn from Chinese Sina Weibo and 5,000 from Japanese Twitter, encompassing dialogues that manifest the specific self-destructive behaviors under investigation. Through careful keyword searches designed by domain experts, the team curated posts indicative of concern. Annotation followed a scale of 0-2, distinguishing between self-destructive and non-destructive content, as well as further identifying discourse concerning personal vs. third-party experiences.
This annotation strategy embodies both linguistic and cultural dimensions, capturing the complex sociocultural lexicon intrinsic to Jirai communities—a transnational phenomenon characterized by distinct communication patterns evolved during the early 2020s, notably under pandemic-induced constraints.
Experimental Framework
The experimental design evaluates four advanced LLMs—Llama-3.1 8B, Qwen-2.5 7B, DeepSeek-v3, and the finetuned JiraiLLM-Qwen—across various configurations. One significant finding is the superior performance of Japanese instruction prompts over Chinese ones when these frameworks process content originating from Chinese platforms. This cross-cultural transfer phenomenon suggests that cultural proximity can outweigh linguistic similarity, unlocking pathways for effective bilingual content moderation.
Notably, the cross-lingual transfer experiment illustrates the ability of finely-tuned models on a source language to exhibit substantial performance gains across the target language systems without explicit training data. This highlights the potential for LLMs to generalize across East Asian language contexts where cultural narratives and shared linguistic artifacts may enhance detection efficacy.
Implications and Observations
The research underscores the necessity of integrating cultural awareness into multilingual AI deployments. This culturally-informed approach holds the promise of refining content moderation systems pertinent to mental health interventions, ensuring sensitive content is identified more accurately across global platforms. Moreover, the documented unexpected cross-cultural transfer effect proposes a methodological pivot toward embedding cultural proximity considerations in model training, suggesting that activating relevant cultural schemas through instruction languages might surpass conventional native language strategies.
Future exploratory pathways include expanding the benchmark to encompass other languages and cultural contexts, accommodating a broader spectrum of self-destructive behaviors, and refining annotation protocols to address the intricacies of the evolving discourse within vulnerable communities.
Conclusion
JiraiBench provides a critical foundational step towards more effective AI implementation in multilingual and culturally complex environments. Its establishment addresses significant limitations prevalent in current systems' abilities to identify and mitigate harmful content. As AI technologies continue to interplay with human mental health challenges globally, frameworks such as JiraiBench represent essential tools in advancing ethically responsive solutions tailored to diverse linguistic and cultural landscapes.