Automatic explanation of the classification of Spanish legal judgments in jurisdiction-dependent law categories with tree estimators (2404.00437v1)
Abstract: Automatic legal text classification systems have been proposed in the literature to address knowledge extraction from judgments and detect their aspects. However, most of these systems are black boxes even when their models are interpretable. This may raise concerns about their trustworthiness. Accordingly, this work contributes with a system combining NLP with Machine Learning (ML) to classify legal texts in an explainable manner. We analyze the features involved in the decision and the threshold bifurcation values of the decision paths of tree structures and present this information to the users in natural language. This is the first work on automatic analysis of legal texts combining NLP and ML along with Explainable Artificial Intelligence techniques to automatically make the models' decisions understandable to end users. Furthermore, legal experts have validated our solution, and this knowledge has also been incorporated into the explanation process as "expert-in-the-loop" dictionaries. Experimental results on an annotated data set in law categories by jurisdiction demonstrate that our system yields competitive classification performance, with accuracy values well above 90%, and that its automatic explanations are easily understandable even to non-expert users.
- Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82, 1059–1086. doi:10.1111/rssb.12377.
- Explainable machine learning multi-label classification of Spanish legal judgements. Journal of King Saud University - Computer and Information Sciences, 34, 10180–10192. doi:10.1016/j.jksuci.2022.10.015.
- Detection of Financial Opportunities in Micro-Blogging Data with a Stacked Classification System. IEEE Access, 8, 215679–215690. doi:10.1109/ACCESS.2020.3041084.
- Automatic Classification and Analysis of Provisions in Italian Legal Texts: A Case Study. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 593–604). volume 3292. doi:10.1007/978-3-540-30470-8_72.
- SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (pp. 3613–3618). Association for Computational Linguistics. doi:10.18653/v1/D19-1371.
- Scalable and explainable legal prediction. Artificial Intelligence and Law, 29, 213–238. doi:10.1007/s10506-020-09273-1.
- Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics, 8, 1–34. doi:10.3390/electronics8080832.
- Deep learning in law: early adaptation and legal word embeddings trained on large corpora. Artificial Intelligence and Law, 27, 171–198. doi:10.1007/s10506-018-9238-9.
- A comparative study of automated legal text classification using random forests and deep learning. Information Processing & Management, 59, 102798–102812. doi:10.1016/j.ipm.2021.102798.
- Surviving the Legal Jungle: Text Classification of Italian Laws in extremely Noisy conditions. In Proceedings of the Italian Conference on Computational Linguistics (pp. 122–127). Accademia University Press volume 2769. doi:10.4000/books.aaccademia.8390.
- CaDET: interpretable parametric conditional density estimation with decision trees and forests. Machine Learning, 108, 1613–1634. doi:10.1007/s10994-019-05820-3.
- Implementing local-explainability in Gradient Boosting Trees: Feature Contribution. Information Sciences, 589, 199–212. doi:10.1016/j.ins.2021.12.111.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171–4186). Association for Computational Linguistics.
- On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data. Electronics, 9, 1–15. doi:10.3390/electronics9050761.
- Dyevre, A. (2021a). Text-mining for Lawyers: How Machine Learning Techniques Can Advance our Understanding of Legal Discourse. Erasmus Law Review, 14, 7–23. doi:10.5553/ELR.000191.
- Dyevre, A. (2021b). The promise and pitfall of automated text-scaling techniques for the analysis of jurisprudential change. Artificial Intelligence and Law, 29, 239–269. doi:10.1007/s10506-020-09274-0.
- User profiling and satisfaction inference in public information access services. Journal of Intelligent Information Systems, 58, 67–89. doi:10.1007/s10844-021-00661-w.
- Emergent vulnerability to climate-driven disturbances in European forests. Nature Communications, 12, 1–12. doi:10.1038/s41467-021-21399-7.
- Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 47, 1–66. doi:10.1007/s10462-016-9475-9.
- A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys, 51, 1–42. doi:10.1145/3236009.
- DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magazine, 40, 44–58. doi:10.1609/aimag.v40i2.2850.
- Explainable artificial intelligence, lawyer’s perspective. In Proceedings of the International Conference on Artificial Intelligence and Law (pp. 60–68). ACM. doi:10.1145/3462757.3466145.
- Explainable AI under contract and tort law: legal incentives and technical challenges. Artificial Intelligence and Law, 28, 415–439. doi:10.1007/s10506-020-09260-6.
- Chatbots: Security, privacy, data protection, and social aspects. Concurrency and Computation: Practice and Experience, 33, 1–13. doi:10.1002/cpe.6426.
- Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences. BMC Medical Informatics and Decision Making, 20, 250–274. doi:10.1186/s12911-020-01201-2.
- Text classification of ideological direction in judicial opinions. International Review of Law and Economics, 62, 105903–105921. doi:10.1016/j.irle.2020.105903.
- Embed2Detect: temporally clustered embedded words for event detection in social media. Machine Learning, 111, 49–87. doi:10.1007/s10994-021-05988-7.
- Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study. Applied Sciences, 11, 1–23. doi:10.3390/app11093986.
- Extending Class Activation Mapping Using Gaussian Receptive Field. Computer Vision and Image Understanding, 231, 103663–103669. doi:10.1016/j.cviu.2023.103663.
- Combining CNN and Grad-CAM for Profitability and Explainability of Investment Strategy: Application to the KOSPI 200 Futures. Expert Systems with Applications, 225, 120086–120098. doi:10.1016/j.eswa.2023.120086.
- Text Classification Algorithms: A Survey. Information, 10, 1–68. doi:10.3390/info10040150.
- Human-in-the-loop interpretability prior. In Advances in Neural Information Processing Systems (pp. 1–10). volume 2018-December.
- treeheatr: an R package for interpretable decision tree visualizations. Bioinformatics, 37, 282–284. doi:10.1093/bioinformatics/btaa662.
- Black-Box Classifier Interpretation Using Decision Tree and Fuzzy Logic-Based Classifier Implementation. The International Journal of Fuzzy Logic and Intelligent Systems, 16, 27–35. doi:10.5391/IJFIS.2016.16.1.27.
- A Review on Interactive Reinforcement Learning From Human Social Feedback. IEEE Access, 8, 120757–120765. doi:10.1109/ACCESS.2020.3006254.
- Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy, 23, 1–45. doi:10.3390/e23010018.
- Deep Learning Techniques: An Overview volume 1141. Springer. doi:10.1007/978-981-15-3383-9_54.
- Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law, 28, 237–266. doi:10.1007/s10506-019-09255-y.
- Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. doi:10.1016/j.artint.2018.07.007.
- Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition, 65, 211–222. doi:10.1016/j.patcog.2016.11.008.
- Explainable Matrix - Visualization for Global and Local Interpretability of Random Forest Classification Ensembles. IEEE Transactions on Visualization and Computer Graphics, 27, 1427–1437. doi:10.1109/TVCG.2020.3030354.
- AI Model for Predicting Legal Judgments to Improve Accuracy and Explainability of Online Privacy Invasion Cases. Applied Sciences, 11, 1–16. doi:10.3390/app112311080.
- Convolutional-neural-network-based Multilabel Text Classification for Automatic Discrimination of Legal Documents. Sensors and Materials, 32, 2659–2672. doi:10.18494/SAM.2020.2794.
- Trustworthy Predictive Algorithms for Complex Forest System Decision-Making. Frontiers in Forests and Global Change, 3, 1–15. doi:10.3389/ffgc.2020.587178.
- Development of Dialogue Management System for Banking Services. Applied Sciences, 11, 1–18. doi:10.3390/app112210995.
- Explainable decision forest: Transforming a decision forest into an interpretable tree. Information Fusion, 61, 124–138. doi:10.1016/j.inffus.2020.03.013.
- Automatic text representation, classification and labeling in European law. In Proceedings of the International Conference on Artificial Intelligence and Law (pp. 78–87). doi:10.1145/383535.383544.
- Transparency and Fairness in Machine Learning Applications. Texas A&M Journal of Property Law, 4, 443–463. doi:10.37419/JPL.V4.I5.2.
- Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training. Information Systems, 106, 101718–101729. doi:10.1016/j.is.2021.101718.
- Coefficient tree regression: fast, accurate and interpretable predictive modeling. Machine Learning, (pp. 1–37). doi:10.1007/s10994-021-06091-7.
- Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code. Artificial Intelligence and Law, (p. 417–473). doi:10.1007/s10506-021-09301-8.
- Probabilistic Feature Selection for Interpretable Random Forest Model volume 1364 AISC. Springer. doi:10.1007/978-3-030-73103-8_50.
- Social Media Data-Based Sentiment Analysis of Tourists’ Air Quality Perceptions. Sustainability, 11, 1–23. doi:10.3390/su11185070.
- An automated text categorization framework based on hyperparameter optimization. Knowledge-Based Systems, 149, 110–123. doi:10.1016/j.knosys.2018.03.003.
- Semi‐supervised, knowledge‐integrated pattern learning approach for fact extraction from judicial text. Expert Systems, 38, 1–20. doi:10.1111/exsy.12656.
- Thompson, P. (2001). Automatic categorization of case law. In Proceedings of the International Conference on Artificial intelligence and Law (pp. 70–77). ACM Press. doi:10.1145/383535.383543.
- Intelligent compilation of patent summaries using machine learning and natural language processing techniques. Advanced Engineering Informatics, 43, 101027–101039. doi:10.1016/j.aei.2019.101027.
- Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. SSRN Electronic Journal, (pp. 841–887). doi:10.2139/ssrn.3063289.
- Explainable AI and Reinforcement Learning—A Systematic Review of Current Approaches and Trends. Frontiers in Artificial Intelligence, 4, 1–15. doi:10.3389/frai.2021.550030.
- Explaining the Differences of Gait Patterns Between High and Low-Mileage Runners With Machine Learning. Scientific Reports, 12, 1–12. doi:10.1038/s41598-022-07054-1.
- Zanzotto, F. M. (2019). Viewpoint: Human-in-the-loop Artificial Intelligence. Journal of Artificial Intelligence Research, 64, 243–252. doi:10.1613/jair.1.11345.
- autoBOT: evolving neuro-symbolic representations for explainable low resource text classification. Machine Learning, 110, 989–1028. doi:10.1007/s10994-021-05968-x.