Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset (2402.17013v1)
Abstract: The assessment of explainability in Legal Judgement Prediction (LJP) systems is of paramount importance in building trustworthy and transparent systems, particularly considering the reliance of these systems on factors that may lack legal relevance or involve sensitive attributes. This study delves into the realm of explainability and fairness in LJP models, utilizing Swiss Judgement Prediction (SJP), the only available multilingual LJP dataset. We curate a comprehensive collection of rationales that support' and
oppose' judgement from legal experts for 108 cases in German, French, and Italian. By employing an occlusion-based explainability approach, we evaluate the explainability performance of state-of-the-art monolingual and multilingual BERT-based LJP models, as well as models developed with techniques such as data augmentation and cross-lingual transfer, which demonstrated prediction performance improvement. Notably, our findings reveal that improved prediction performance does not necessarily correspond to enhanced explainability performance, underscoring the significance of evaluating models from an explainability perspective. Additionally, we introduce a novel evaluation framework, Lower Court Insertion (LCI), which allows us to quantify the influence of lower court information on model predictions, exposing current models' biases.
- Abhaya Agarwal and Alon Lavie. 2007. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. Proceedings of WMT-08.
- Predicting judicial decisions of the european court of human rights: A natural language processing perspective. PeerJ Computer Science, 2:e93.
- Vincent Aleven and Kevin D Ashley. 1997. Teaching case-based argumentation through a model and examples: Empirical evaluation of an intelligent learning environment. In Artificial intelligence in education, volume 39, pages 87–94. Citeseer.
- Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23:77–91.
- Vithor Gomes Ferreira Bertalan and Evandro Eduardo Seron Ruiz. 2020. Predicting judicial outcomes in the brazilian legal system using textual features. In DHandNLP@ PROPOR, pages 22–32.
- MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset. ArXiv:2305.01211 [cs].
- Stefanie Brüninghaus and Kevin D Ashley. 2003. Combining case-based and model-based reasoning for predicting the outcome of legal cases. In Case-Based Reasoning Research and Development: 5th International Conference on Case-Based Reasoning, ICCBR 2003 Trondheim, Norway, June 23–26, 2003 Proceedings 5, pages 65–79. Springer.
- Stefanie Brüninghaus and Kevin D Ashley. 2005. Generating legal arguments and predictions from case texts. In Proceedings of ICAIL 2005, pages 65–74.
- Neural legal judgment prediction in english. In Proceedings of ACL 2019, pages 4317–4323.
- An exploration of hierarchical attention transformers for efficient long document classification. arXiv preprint arXiv:2210.05529.
- Paragraph-level rationale extraction through regularization: A case study on european court of human rights cases. In Proceedings of the NAACL-HLT 2021, pages 226–241.
- Lexglue: A benchmark dataset for legal language understanding in english. In Proceedings of ACL 2022, pages 4310–4330.
- Fairlex: A multilingual benchmark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4389–4406.
- Deepset-open sourcing german bert.
- Charles Condevaux and Sébastien Harispe. 2022. Lsg attention: Extrapolation of pretrained transformers to long sequences. arXiv preprint arXiv:2210.15497.
- Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
- A survey of the state of explainable ai for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 447–459.
- Matthias Grabmair. 2017. Predicting trade secret case outcomes using argument schemes and learned quantitative value effect tradeoffs. In Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, pages 89–98.
- On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR.
- Equality of opportunity in supervised learning. Advances in neural information processing systems, 29.
- Legalrelectra: Mixed-domain language modeling for long-range legal text comprehension. arXiv preprint arXiv:2212.08204.
- A general approach for predicting the behavior of the supreme court of the united states. PloS one, 12(4):e0174698.
- Improving supreme court forecasting using boosted decision trees. Political Analysis, 27(3):381–387.
- Arshdeep Kaur and Bojan Bozic. 2019. Convolutional neural network-based automatic prediction of judgments of the european court of human rights. In AICS, pages 458–469.
- Predicting judicial decisions of criminal cases from thai supreme court using bi-directional gru with attention mechanism. In 2018 5th Asian Conference on Defense Technology (ACDT), pages 50–55. IEEE.
- Predicting brazilian court decisions. PeerJ Computer Science, 8:e904.
- Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Zhenyu Liu and Huanhuan Chen. 2017. A predictive performance comparison of machine learning models for judicial cases. In 2017 IEEE Symposium series on computational intelligence (SSCI), pages 1–6. IEEE.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
- Learning to predict charges for criminal cases with legal basis. In Proceedings of EMNLP 2017, pages 2727–2736.
- Ildc for cjpe: Indian legal documents corpus for court judgment prediction and explanation. In Proceedings of ACL-IJCNLP 2021, pages 4046–4062.
- Camembert: a tasty french language model. In ACL 2020-58th Annual Meeting of the Association for Computational Linguistics.
- Automatic judgement forecasting for pending applications of the european court of human rights. In ASAIL/LegalAIIA@ ICAIL.
- Judicial decisions of the european court of human rights: Looking into the crystal ball. In Proceedings of the conference on empirical legal studies, page 24.
- Natural language processing in law: Prediction of outcomes in the higher courts of turkey. Information Processing & Management, 58(5):102684.
- Swiss-judgment-prediction: A multilingual legal judgment prediction benchmark. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 19–35.
- Joel Niklaus and Daniele Giofré. 2022. Budgetlongformer: Can we cheaply pretrain a sota legal language model from scratch? arXiv preprint arXiv:2211.17135.
- Lextreme: A multi-lingual and multi-task benchmark for the legal domain. arXiv preprint arXiv:2301.13126.
- Multilegalpile: A 689gb multilingual legal corpus.
- An empirical study on cross-x transfer for legal judgment prediction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pages 32–46.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Umberto: an italian language model trained with whole word masking. Original-date, 55:31Z.
- SCALE: Scaling up the Complexity for Advanced Language Model Evaluation. ArXiv:2306.09237 [cs].
- Sentence boundary detection: A long solved problem? In Proceedings of COLING 2012: Posters, pages 985–994.
- " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
- Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
- Edwina L Rissland and Kevin D Ashley. 1987. A case-based system for trade secrets law. In Proceedings of the 1st international conference on Artificial intelligence and law, pages 60–66.
- Leveraging task dependency and contrastive learning for case outcome classification on european court of human rights cases. arXiv preprint arXiv:2302.00768.
- Zero shot transfer of article-aware legal outcome classification for european court of human rights cases. arXiv preprint arXiv:2302.00609.
- Deconfounding legal judgment prediction for European court of human rights cases towards better alignment with experts. In Proceedings of EMNLP 2022.
- Sentence boundary detection in adjudicatory decisions in the united states. Traitement automatique des langues, 58:21.
- Classactionprediction: A challenging benchmark for legal judgment prediction of class action cases in the us. arXiv preprint arXiv:2211.00582.
- Using artificial intelligence to predict decisions of the turkish constitutional court. Social Science Computer Review, page 08944393211010398.
- Predicting outcomes of legal cases based on legal factors using classifiers. Procedia Computer Science, 167:2393–2402.
- Average individual fairness: Algorithms, generalization and experiments. Advances in neural information processing systems, 32.
- Benjamin Strickson and Beatriz De La Iglesia. 2020. Legal judgement prediction for uk courts. In Proceedings of the 2020 the 3rd international conference on information science and system, pages 204–209.
- Exploring the use of text classification in the legal domain.
- Predicting the law area and decisions of french supreme court cases. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 716–722.
- Axiomatic attribution for deep networks. In ICML, pages 3319–3328. PMLR.
- Dimitrios Tsarapatsanis and Nikolaos Aletras. 2021. On the ethical limits of natural language processing on legal text. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3590–3599.
- Predicting decisions of the philippine supreme court using natural language processing and machine learning. In 2018 IEEE 42nd annual computer software and applications conference (COMPSAC), volume 2, pages 130–135. IEEE.
- Predicting the outcome of appeal decisions in germany’s tax law. In International conference on electronic participation, pages 89–99. Springer.
- Equality before the law: Legal judgment consistency analysis for fairness. arXiv preprint arXiv:2103.13868.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
- Counterfactual fairness: Unidentification, bound and algorithm. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.
- From dissonance to insights: Dissecting disagreements in rationale construction for case outcome classification. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9558–9576, Singapore. Association for Computational Linguistics.
- Neurjudge: a circumstance-aware neural framework for legal judgment prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 973–982.
- Training individually fair ml models with sensitive subspace robustness. In International Conference on Learning Representations.
- Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web, pages 1171–1180.
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer.
- Junzhe Zhang and Elias Bareinboim. 2018. Fairness in decision-making—the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
- Iteratively questioning and answering for interpretable legal judgment prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 1250–1257.
- Santosh T. Y. S. S (1 paper)
- Nina Baumgartner (1 paper)
- Matthias Stürmer (13 papers)
- Matthias Grabmair (33 papers)
- Joel Niklaus (21 papers)