2000 character limit reached
A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts (2103.11528v4)
Published 22 Mar 2021 in cs.CL
Abstract: In recent years, Vietnam witnesses the mass development of social network users on different social platforms such as Facebook, Youtube, Instagram, and Tiktok. On social medias, hate speech has become a critical problem for social network users. To solve this problem, we introduce the ViHSD - a human-annotated dataset for automatically detecting hate speech on the social network. This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE. Besides, we introduce the data creation process for annotating and evaluating the quality of the dataset. Finally, we evaluated the dataset by deep learning models and transformer models.
- Son T. Luu (26 papers)
- Kiet Van Nguyen (74 papers)
- Ngan Luu-Thuy Nguyen (56 papers)