BayesJudge: Bayesian Kernel Language Modelling with Confidence Uncertainty in Legal Judgment Prediction
Abstract: Predicting legal judgments with reliable confidence is paramount for responsible legal AI applications. While transformer-based deep neural networks (DNNs) like BERT have demonstrated promise in legal tasks, accurately assessing their prediction confidence remains crucial. We present a novel Bayesian approach called BayesJudge that harnesses the synergy between deep learning and deep Gaussian Processes to quantify uncertainty through Bayesian kernel Monte Carlo dropout. Our method leverages informative priors and flexible data modelling via kernels, surpassing existing methods in both predictive accuracy and confidence estimation as indicated through brier score. Extensive evaluations of public legal datasets showcase our model's superior performance across diverse tasks. We also introduce an optimal solution to automate the scrutiny of unreliable predictions, resulting in a significant increase in the accuracy of the model's predictions by up to 27\%. By empowering judges and legal professionals with more reliable information, our work paves the way for trustworthy and transparent legal AI applications that facilitate informed decisions grounded in both knowledge and quantified uncertainty.
- Q. Dong and S. Niu, “Legal judgment prediction via relational learning,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 983–992.
- F. Schauer, “The limited domain of the law,” Va. L. Rev., vol. 90, p. 1909, 2004.
- S. Raaijmakers, “Artificial intelligence for law enforcement: challenges and opportunities,” IEEE security & privacy, vol. 17, no. 5, pp. 74–77, 2019.
- C. Sansone and G. Sperlí, “Legal information retrieval systems: State-of-the-art and open issues,” Information Systems, vol. 106, p. 101967, 2022.
- R. A. Oppel and J. K. Patel, “One lawyer, 194 felony cases, and no time,” The New York Times, vol. 31, 2019.
- L. S. Corporation, “The justice gap: Measuring the unmet civil legal needs of low-income americans,” 2017.
- J. Geng, F. Cai, Y. Wang, H. Koeppl, P. Nakov, and I. Gurevych, “A survey of language model confidence estimation and calibration,” arXiv preprint arXiv:2311.08298, 2023.
- A. Shelmanov, E. Tsymbalov, D. Puzyrev, K. Fedyanin, A. Panchenko, and M. Panov, “How certain is your transformer?” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 1833–1840.
- J. Pei, C. Wang, and G. Szarvas, “Transformer uncertainty estimation with hierarchical stochastic attention,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, 2022, pp. 11 147–11 155.
- A. Vazhentsev, G. Kuzmin, A. Tsvigun, A. Panchenko, M. Panov, M. Burtsev, and A. Shelmanov, “Hybrid uncertainty quantification for selective text classification in ambiguous tasks,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 11 659–11 681.
- E. Fadeeva, R. Vashurin, A. Tsvigun, A. Vazhentsev, S. Petrakov, K. Fedyanin, D. Vasilev, E. Goncharova, A. Panchenko, M. Panov et al., “Lm-polygraph: Uncertainty estimation for language models,” arXiv preprint arXiv:2311.07383, 2023.
- S. Angra and S. Ahuja, “Machine learning and its applications: A review,” in 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC). IEEE, 2017, pp. 57–60.
- S. Dargan, M. Kumar, M. R. Ayyagari, and G. Kumar, “A survey of deep learning and its applications: a new paradigm to machine learning,” Archives of Computational Methods in Engineering, vol. 27, pp. 1071–1092, 2020.
- I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, “Legal-bert: The muppets straight out of law school,” arXiv preprint arXiv:2010.02559, 2020.
- L. Zheng, N. Guha, B. R. Anderson, P. Henderson, and D. E. Ho, “When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings,” in Proceedings of the eighteenth international conference on artificial intelligence and law, 2021, pp. 159–168.
- J. Cui, X. Shen, and S. Wen, “A survey on legal judgment prediction: Datasets, metrics, models and challenges,” IEEE Access, 2023.
- Z. Hu, X. Li, C. Tu, Z. Liu, and M. Sun, “Few-shot charge prediction with discriminative legal attributes,” in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 487–498.
- X. Li, Y. Rao, W. Wang, and C. Feng, “Slbcnn: a improved deep learning model for few-shot charge prediction,” Procedia Computer Science, vol. 174, pp. 32–39, 2020.
- K. Miok, B. Škrlj, D. Zaharie, and M. Robnik-Šikonja, “To ban or not to ban: Bayesian attention networks for reliable hate speech detection,” Cognitive Computation, pp. 1–19, 2022.
- Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning. PMLR, 2016, pp. 1050–1059.
- L. L. Folgoc, V. Baltatzis, S. Desai, A. Devaraj, S. Ellis, O. E. M. Manzanera, A. Nair, H. Qiu, J. Schnabel, and B. Glocker, “Is mc dropout bayesian?” arXiv preprint arXiv:2110.04286, 2021.
- H. Surden, “Machine learning and law,” Wash. L. Rev., vol. 89, p. 87, 2014.
- O. Biran and C. Cotton, “Explanation and justification in machine learning: A survey,” in IJCAI-17 workshop on explainable AI (XAI), vol. 8, no. 1, 2017, pp. 8–13.
- M. F. Islam, S. Zabeen, M. A. Islam, F. B. Rahman, A. Ahmed, D. Z. Karim, A. A. Rasel, and M. A. Manab, “How certain are tansformers in image classification: uncertainty analysis with monte carlo dropout,” in Fifteenth International Conference on Machine Vision (ICMV 2022), vol. 12701. SPIE, 2023, pp. 158–165.
- L. Erlygin, V. Zholobov, V. Baklanova, E. Sokolovskiy, and A. Zaytsev, “Uncertainty estimation for time series forecasting via gaussian process regression surrogates,” arXiv preprint arXiv:2302.02834, 2023.
- A. J. Mosley, L. Heiphetz, M. H. White, and M. Biernat, “Perceptions of harm and benefit predict judgments of cultural appropriation,” Social Psychological and Personality Science, p. 19485506231162401, 2023.
- H. Zhong, Z. Guo, C. Tu, C. Xiao, Z. Liu, and M. Sun, “Legal judgment prediction via topological learning,” in Proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 3540–3549.
- M. Medvedeva, M. Wieling, and M. Vols, “Rethinking the field of automatic prediction of court decisions,” Artificial Intelligence and Law, vol. 31, no. 1, pp. 195–212, 2023.
- D. Ganguly, J. G. Conrad, K. Ghosh, S. Ghosh, P. Goyal, P. Bhattacharya, S. K. Nigam, and S. Paul, “Legal ir and nlp: the history, challenges, and state-of-the-art,” in European Conference on Information Retrieval. Springer, 2023, pp. 331–340.
- H. Zhang, Z. Dou, Y. Zhu, and J.-R. Wen, “Contrastive learning for legal judgment prediction,” ACM Transactions on Information Systems, vol. 41, no. 4, pp. 1–25, 2023.
- A. Modi, P. Kalamkar, S. Karn, A. Tiwari, A. Joshi, S. K. Tanikella, S. K. Guha, S. Malhan, and V. Raghavan, “Semeval 2023 task 6: Legaleval–understanding legal texts,” arXiv preprint arXiv:2304.09548, 2023.
- R. C. Lawlor, “What computers can do: Analysis and prediction of judicial decisions,” American Bar Association Journal, pp. 337–344, 1963.
- J. Valvoda, R. Cotterell, and S. Teufel, “On the role of negative precedent in legal outcome prediction,” Transactions of the Association for Computational Linguistics, vol. 11, pp. 34–48, 2023.
- Y. Chang, X. Wang, J. Wang, Y. Wu, K. Zhu, H. Chen, L. Yang, X. Yi, C. Wang, Y. Wang et al., “A survey on evaluation of large language models,” arXiv preprint arXiv:2307.03109, 2023.
- J. Collenette, K. Atkinson, and T. Bench-Capon, “Explainable ai tools for legal reasoning about cases: A study on the european court of human rights,” Artificial Intelligence, vol. 317, p. 103861, 2023.
- H. Zhong, Y. Wang, C. Tu, T. Zhang, Z. Liu, and M. Sun, “Iteratively questioning and answering for interpretable legal judgment prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, 2020, pp. 1250–1257.
- S. Paul, P. Goyal, and S. Ghosh, “Automatic charge identification from facts: A few sentence-level charge annotations is all you need,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 1011–1022.
- J. Ge, Y. Huang, X. Shen, C. Li, and W. Hu, “Learning fine-grained fact-article correspondence in legal cases,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3694–3706, 2021.
- N. Xu, P. Wang, L. Chen, L. Pan, X. Wang, and J. Zhao, “Distinguish confusing law articles for legal judgment prediction,” arXiv preprint arXiv:2004.02557, 2020.
- Y. Liu, Y. Wu, Y. Zhang, C. Sun, W. Lu, F. Wu, and K. Kuang, “Ml-ljp: Multi-law aware legal judgment prediction,” in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 1023–1034.
- L. Ma, Y. Zhang, T. Wang, X. Liu, W. Ye, C. Sun, and S. Zhang, “Legal judgment prediction with multi-stage case representation learning in the real court setting,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 993–1002.
- G. Semo, D. Bernsohn, B. Hagag, G. Hayat, and J. Niklaus, “Classactionprediction: A challenging benchmark for legal judgment prediction of class action cases in the us,” arXiv preprint arXiv:2211.00582, 2022.
- K. D. Ashley, “A brief history of the changing roles of case prediction in ai and law,” Law Context: A Socio-Legal J., vol. 36, p. 93, 2019.
- N. Xu, P. Wang, L. Chen, L. Pan, X. Wang, and J. Zhao, “Distinguish confusing law articles for legal judgment prediction,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Online: Association for Computational Linguistics, Jul. 2020, pp. 3086–3095. [Online]. Available: https://aclanthology.org/2020.acl-main.280
- K.-C. Chien, C.-H. Chang, and R.-D. Sun, “Legal knowledge management for prosecutors based on judgment prediction and error analysis from indictments,” Computer Law & Security Review, vol. 52, p. 105902, 2024.
- J. Wang, Y. Le, D. Cao, S. Lu, Z. Quan, and M. Wang, “Graph reasoning with supervised contrastive learning for legal judgment prediction,” IEEE Transactions on Neural Networks and Learning Systems, 2024.
- Y. Le, S. Xiao, Z. Xiao, and K. Li, “Topology-aware multi-task learning framework for civil case judgment prediction,” Expert Systems with Applications, vol. 238, p. 122103, 2024.
- E. Jacob de Menezes-Neto and M. B. M. Clementino, “Using deep learning to predict outcomes of legal appeals better than human experts: A study with data from brazilian federal courts,” PloS One, vol. 17, no. 7, p. e0272287, 2022.
- W. Deng, J. Pei, K. Kong, Z. Chen, F. Wei, Y. Li, Z. Ren, Z. Chen, and P. Ren, “Syllogistic reasoning for legal judgment analysis,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 13 997–14 009.
- Y. Zhang, D. Feng, W. Tian, and H. Wang, “Interpretable sentencing element-based prison term prediction,” in 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2022, pp. 2214–2220.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.
- W. J. Maddox, P. Izmailov, T. Garipov, D. P. Vetrov, and A. G. Wilson, “A simple baseline for bayesian uncertainty in deep learning,” Advances in neural information processing systems, vol. 32, 2019.
- J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “Deep neural networks as gaussian processes,” arXiv preprint arXiv:1711.00165, 2017.
- A. Damianou and N. D. Lawrence, “Deep gaussian processes,” in Artificial intelligence and statistics. PMLR, 2013, pp. 207–215.
- V. Carlsson, “Legal certainty in automated decision-making in welfare services,” Public Policy and Administration, p. 09520767231202334, 2023.
- M. Magris and A. Iosifidis, “Bayesian learning for neural networks: an algorithmic survey,” Artificial Intelligence Review, pp. 1–51, 2023.
- M. E. E. Khan, A. Immer, E. Abedi, and M. Korzepa, “Approximate inference turns deep networks into gaussian processes,” Advances in neural information processing systems, vol. 32, 2019.
- D. J. MacKay et al., “Introduction to gaussian processes,” NATO ASI series F computer and systems sciences, vol. 168, pp. 133–166, 1998.
- D. Fink, “A compendium of conjugate priors,” See http://www. people. cornell. edu/pages/df36/CONJINTRnew% 20TEX. pdf, vol. 46, 1997.
- A. Patle and D. S. Chouhan, “Svm kernel functions for classification,” in 2013 International conference on advances in technology and engineering (ICATE). IEEE, 2013, pp. 1–9.
- I. Chalkidis, I. Androutsopoulos, and N. Aletras, “Neural legal judgment prediction in english,” arXiv preprint arXiv:1906.02059, 2019.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- I. Chalkidis, N. Garneau, C. Goanta, D. M. Katz, and A. Søgaard, “Lexfiles and legallama: Facilitating english multinational legal language model development,” arXiv preprint arXiv:2305.07507, 2023.
- J. Zhu, A. Ahmed, and E. P. Xing, “Medlda: maximum margin supervised topic models for regression and classification,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 1257–1264.
- S. Jameel, W. Lam, S. Schockaert, and L. Bing, “A unified posterior regularized topic model with maximum margin for learning-to-rank,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pp. 103–112.
- M. K. V. Vorobev and M. Kuznetsov, “A paraphrasing model based on chatgpt paraphrases,” A paraphrasing model based on ChatGPT paraphrases, 2023.
- M. Medvedeva and P. McBride, “Legal judgment prediction: If you are going to do it, do it right,” in Proceedings of the Natural Legal Language Processing Workshop 2023, 2023, pp. 73–84.
- C. Parlett-Pelleriti, G. C. Lin, M. R. Jones, E. Linstead, and S. M. Jaeggi, “Exploring age-related metamemory differences using modified brier scores and hierarchical clustering,” Open psychology, vol. 1, no. 1, pp. 215–238, 2019.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
- T. Griffiths, M. Jordan, J. Tenenbaum, and D. Blei, “Hierarchical topic models and the nested chinese restaurant process,” Advances in neural information processing systems, vol. 16, 2003.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.