Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain? (2402.10567v4)

Published 16 Feb 2024 in cs.CL and cs.AI

Abstract: Recent advancements in language technology and Artificial Intelligence have resulted in numerous LLMs being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of LLMs to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $\beta$-weighted $\textit{Legal Safety Score ($LSS_{\beta}$)}$, which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs' safety by considering its performance in the $\textit{Binary Statutory Reasoning}$ task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA--2 models indicate that the proposed $LSS_{\beta}$ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA--2 models increase the $LSS_{\beta}$, improving their usability in the Indian legal domain. Our code is publicly released.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. ANI. 2023. In a first, punjab and haryana high court uses chat gpt to decide bail plea. The Times of India.
  2. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075.
  3. Re-contextualizing fairness in nlp: The case of india. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 727–740.
  4. Can gpt-3 perform statutory reasoning? arXiv preprint arXiv:2302.06100.
  5. Data science and ai in fintech: An overview.
  6. Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the conference on fairness, accountability, and transparency, pages 319–328.
  7. Art or artifice? large language models and the false promise of creativity.
  8. Legal-bert: The muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2898–2904.
  9. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 99–108.
  10. T. Davenport and R. Kalakota. 2019. The potential for artificial intelligence in healthcare. Future Healthc J, 6(2):94–98.
  11. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259–268.
  12. Emilio Ferrara. 2023. Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci, 6(1):3.
  13. Bias and fairness in large language models: A survey.
  14. Christian Haas. 2019. The price of fairness - a framework to explore trade-offs in algorithmic fairness.
  15. Nils Holzenberger and Benjamin Van Durme. 2021. Factoring statutory reasoning as language understanding challenges. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2742–2758, Online. Association for Computational Linguistics.
  16. Bia mitigation for machine learning classifiers: A comprehensive survey. arXiv preprint arXiv:2207.07068.
  17. Lora: Low-rank adaptation of large language models.
  18. Fairness-enhancing interventions in stream classification. In Database and Expert Systems Applications: 30th International Conference, DEXA 2019, Linz, Austria, August 26–29, 2019, Proceedings, Part I 30, pages 261–276. Springer.
  19. Information extraction from case law and retrieval of prior cases. Artificial Intelligence, 150(1-2):239–290.
  20. James E Johndrow and Kristian Lum. 2019. An algorithm for removing sensitive information. The Annals of Applied Statistics, 13(1):189–220.
  21. Faisal Kamiran and Toon Calders. 2009. Classifying without discriminating. In 2009 2nd international conference on computer, control and communication, pages 1–6. IEEE.
  22. Discrimination aware decision tree learning. In 2010 IEEE international conference on data mining, pages 869–874. IEEE.
  23. HLDC: Hindi legal documents corpus. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3521–3536, Dublin, Ireland. Association for Computational Linguistics.
  24. Chatgpt and large language model (llm) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. Journal of Pediatric Urology, 19(5):598–604.
  25. Summarizing legal regulatory documents using transformers. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2426–2430.
  26. James Kurth. 2003. Western civilization, our tradition. Intercollegiate Review, 39(1/2):5.
  27. ’propose and review’: Interactive bias mitigation for machine classifiers. Available at SSRN 4139244.
  28. Suyun Liu and Luís Nunes Vicente. 2020. Accuracy and fairness trade-offs in machine learning: A stochastic multi-objective approach. CoRR, abs/2008.01132.
  29. Kristian Lum and James Johndrow. 2016. A statistical framework for fair predictive algorithms. arXiv preprint arXiv:1610.08077.
  30. ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4046–4062, Online. Association for Computational Linguistics.
  31. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
  32. jurbert: A romanian bert model for legal judgement prediction. Proceedings of the Natural Legal Language Processing Workshop 2021.
  33. A survey on bias and fairness in machine learning.
  34. National Crime Records Bureau Ministry of Home Affairs. 2021. Crime in india 2021. [Online; accessed 13-January-2023].
  35. OpenAI. 2022. Openai: Introducing chatgpt.
  36. Pre-training transformers on indian legal text. arXiv preprint arXiv:2209.06049.
  37. Fairness-aware training of decision trees by abstract interpretation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 1508–1517.
  38. John Rawls. 1971. A Theory of Justice: Original Edition. Harvard University Press.
  39. Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 315–328.
  40. Jackson Sargent and Melanie Weber. 2021. Identifying biases in legal data: An algorithmic fairness perspective. arXiv preprint arXiv:2109.09946.
  41. No classification without representation: Assessing geodiversity issues in open data sets for the developing world.
  42. Benjamin Strickson and Beatriz de la Iglesia. 2020. Legal judgement prediction for uk courts. Proceedings of the 3rd International Conference on Information Science and Systems.
  43. Luke Taylor. 2023. Colombian judge says he used chatgpt in ruling. The Guardian.
  44. Llama: Open and efficient foundation language models.
  45. Llama 2: Open foundation and fine-tuned chat models.
  46. Legal prompt engineering for multilingual legal judgement prediction. arXiv preprint arXiv:2212.02199.
  47. Synthesizing fair decision trees via iterative constraint solving. In International Conference on Computer Aided Verification, pages 364–385. Springer.
  48. Wang, Eric J. 2023. alpaca-lora. [Online; accessed 13-October-2023].
  49. Unlocking fairness: a trade-off revisited. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  50. Wikipedia contributors. 2022. Crime in india — Wikipedia, the free encyclopedia. [Online; accessed 13-January-2023].
  51. Training individually fair ml models with sensitive subspace robustness. arXiv preprint arXiv:1907.00020.
  52. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340.
  53. Achieving non-discrimination in prediction. arXiv preprint arXiv:1703.00060.
  54. Handling conditional discrimination. In 2011 IEEE 11th international conference on data mining, pages 992–1001. IEEE.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com