Abstract
Hate speech on social media platforms has grown to become a major problem. In this study, we explore strategies to efficiently lessen its harmful effects by supporting content moderation through machine learning (ML). In order to present a more accurate spectrum of severity and surmount the constraints of seeing hate speech as a binary task (as typical in sentiment analysis), we classify hate speech into four intensities: no hate, intimidation, offense or discrimination, and promotion of violence. For this, we first involve 31 users in annotating a dataset in English and German. To promote interpretability and transparency, we integrate our ML system in a dashboard provided with explainable AI (XAI). By performing a case study with 40 non-experts moderators, we evaluated the efficacy of the proposed XAI dashboard in supporting content moderation. Our results suggest that assessing hate intensities is important for content moderators, as these can be related to specific penalties. Similarly, XAI seems to be a promising method to improve ML trustworthiness, by this, facilitating moderators' well-informed decision-making.
Citation
Raisa Romanov Geleta,
Klaus Eckelt,
Emilia
Parada-Cabaleiro,
Markus
Schedl
Exploring Intensities of Hate Speech on Social Media: A Case Study on Explaining Multilingual Models with XAI
Proceedings of the 4th Conference on Language, Data and Knowledge,
532--537, doi:10.34619/srmk-injj, 2023.
BibTeX
@inproceedings{Geleta2023Exploring_LDK_2023, title = {Exploring Intensities of Hate Speech on Social Media: A Case Study on Explaining Multilingual Models with XAI}, author = {Geleta, Raisa Romanov and Eckelt, Klaus and Parada-Cabaleiro, Emilia and Schedl, Markus}, booktitle = {Proceedings of the 4th Conference on Language, Data and Knowledge}, publisher = {NOVA CLUNL, Portugal}, address = {Vienna, Austria}, doi = {10.34619/srmk-injj}, url = {https://aclanthology.org/2023.ldk-1.57}, pages = {532--537}, year = {2023} }