- The paper demonstrates that zero-shot self-explanations closely align with human rationales in multilingual sentiment analysis and forced labor detection.
- It leverages instruction-tuned LLMs like Llama2, Llama3, Mistral, and Mixtral to generate and evaluate explanations across English, Danish, and Italian.
- Results highlight that Llama3 achieves the highest agreement with human annotations, emphasizing improved interpretability for explainable AI.
Analyzing Zero-Shot Self-Explanations in Multilingual Text Classification
The research paper titled "Comparing Zero-Shot Self-Explanations with Human Rationales in Multilingual Text Classification" investigates the capabilities of instruction-tuned LLMs to self-generate explanations in the form of self-explanations, which are compared against human rationales and traditional post-hoc explainability methods like Layer-wise Relevance Propagation (LRP). The focus is primarily on text classification tasks encompassing sentiment analysis and forced labor detection, using both English and the translations in Danish and Italian.
Methodological Approach
The paper explores self-explanations generated by LLMs, including Llama2, Llama3, Mistral, and Mixtral. These models were tasked with generating rationale explanations in a zero-shot context across multiple languages. The qualitative analysis involved comparing the self-explanations with human annotations and post-hoc explanations generated through LRP, examining their plausibility to humans and faithfulness to model decisions.
For sentiment classification, annotations were evaluated in multiple languages, extending the analysis beyond English to Italian and Danish, which provides an insight into multilingual capabilities. Forced labor detection, a more complex task, tested the models on annotated news articles for specific risk indicators.
Experimental Results
The results indicate that self-explanations align more closely with human rationales compared to post-hoc methods like LRP, especially in terms of plausibility. Llama3 exhibited the highest level of agreement with human rationales, indicating potentially superior instruction-following and language understanding capabilities.
The task accuracy was generally high for sentiment classification across all languages, suggesting robust zero-shot multilingual capabilities. Conversely, forced labor detection showcased more variability, indicating task-specific challenges and the complexity involved in such domains.
Another critical finding was the models' varied adherence to instruction-prescribed constraints like JSON syntax in explanations, with Llama3 demonstrating more reliable compliance.
Faithfulness vs. Plausibility
Despite self-explanations displaying high human plausibility, their faithfulness was comparable to post-hoc attributions. The paper highlights that the feature removal methods did not significantly impact class probabilities, pointing to potential discrepancies between perceived relevance and model-specific inference pathways.
Contrastive explanations did not inherently yield higher plausibility, showing varying effectiveness across datasets and indicators, aligning with prior findings on explanatory approaches.
Implications and Future Directions
The implications of this work are multifaceted for the fields of Explainable AI (XAI) and multilingual text classification. It suggests that self-explanations could provide a more direct and human-comprehensible mode of interaction with AI systems, potentially enhancing user trust and understanding—particularly critical in applications accessible to a broader audience.
The potential of models to generalize across languages without substantial prior exposure underscores a need for further exploration into the robustness of cross-linguistic transfer learning. Similarly, the ability to accurately explain complex domain-specific classifications like forced labor detection could be pivotal in applying AI in sensitive contexts.
Future research could probe deeper into the mechanisms driving the alignment between self-explanations and human rationales. Additionally, exploring diverse, less-documented languages and aligning self-explanations more closely with model faithfulness while preserving human plausibility could further advance AI interpretability.
In conclusion, this paper contributes valuable insight into the evolving capabilities of LLMs to self-generate explanations, illustrating both their potential and current limitations within multilingual and domain-specific contexts. As AI systems are increasingly integrated into decision-making processes, their ability to provide transparent, understandable, and faithful explanations remains of paramount importance.