Cross-Domain Toxic Spans Detection (2306.09642v1)
Abstract: Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned LLMs. Our findings indicate that a simple method using off-the-shelf lexicons performs best in the cross-domain setup. The cross-domain error analysis suggests that (1) rationale extraction methods are prone to false negatives, while (2) LLMs, despite performing best for the in-domain case, recall fewer explicitly toxic words than lexicons and are prone to certain types of false positives. Our code is publicly available at: https://github.com/sfschouten/toxic-cross-domain.
- Ribeiro, M., Singh, S., Guestrin, C.: “why should I trust you?”: Explaining the predictions of any classifier. In: Proc. of NAACL-HLT2016: Demonstrations. pp. 97–101 (Jun 2016). https://doi.org/10.18653/v1/N16-3020
- Rusert, J.: NLP_UIOWA at Semeval-2021 task 5: Transferring toxic sets to tag toxic spans. In: Proc. of SemEval-2021. pp. 881–887 (Aug 2021). https://doi.org/10.18653/v1/2021.semeval-1.119