Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice (2402.01864v2)

Published 2 Feb 2024 in cs.CY and cs.AI
(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice

Abstract: LLMs are increasingly capable of providing users with advice in a wide range of professional domains, including legal advice. However, relying on LLMs for legal queries raises concerns due to the significant expertise required and the potential real-world consequences of the advice. To explore \textit{when} and \textit{why} LLMs should or should not provide advice to users, we conducted workshops with 20 legal experts using methods inspired by case-based reasoning. The provided realistic queries ("cases") allowed experts to examine granular, situation-specific concerns and overarching technical and legal constraints, producing a concrete set of contextual considerations for LLM developers. By synthesizing the factors that impacted LLM response appropriateness, we present a 4-dimension framework: (1) User attributes and behaviors, (2) Nature of queries, (3) AI capabilities, and (4) Social impacts. We share experts' recommendations for LLM response strategies, which center around helping users identify `right questions to ask' and relevant information rather than providing definitive legal judgments. Our findings reveal novel legal considerations, such as unauthorized practice of law, confidentiality, and liability for inaccurate advice, that have been overlooked in the literature. The case-based deliberation method enabled us to elicit fine-grained, practice-informed insights that surpass those from de-contextualized surveys or speculative principles. These findings underscore the applicability of our method for translating domain-specific professional knowledge and practices into policies that can guide LLM behavior in a more responsible direction.

The paper "(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice" explores the nuanced and complex terrain of using LLMs for providing legal advice. This interdisciplinary research addresses the critical issue of ensuring responsible and ethical deployment of AI systems in the legal domain.

To investigate the circumstances under which LLMs should or should not provide legal advice, the authors engaged in workshops with 20 legal experts. These workshops leveraged methods inspired by case-based reasoning, allowing participants to delve into realistic legal queries, thereby identifying situation-specific concerns as well as broader technical and legal constraints. This methodological approach facilitated an in-depth examination of the contextual factors influencing the appropriateness of LLM responses in legal contexts.

From these workshops, the researchers distilled a four-dimensional framework to guide LLM developers in responsibly structuring AI interactions in the legal domain. The four dimensions are:

  1. User Attributes and Behaviors: This dimension considers the characteristics and actions of the users seeking legal advice, emphasizing the importance of understanding the user's background, intent, and familiarity with legal concepts.
  2. Nature of Queries: This dimension addresses the specificities of the legal queries posed to the LLM, including the complexity, specificity, and legal ramifications of the questions.
  3. AI Capabilities: Here, the focus is on the technical capabilities and limitations of the LLM, stressing the need for developers to calibrate the AI's responses according to its competencies and ensuring that it does not overstep its advisory capacity.
  4. Social Impacts: This dimension explores the broader societal consequences of deploying LLMs for legal advice, such as the unauthorized practice of law, potential breaches of confidentiality, and the liability implications for providing inaccurate or harmful advice.

Based on these considerations, the experts recommended that LLM response strategies should prioritize helping users formulate appropriate legal questions and identify relevant information sources. Rather than providing definitive legal judgments, LLMs should act as guides to enhance the user's understanding and navigate the legal landscape more effectively.

Moreover, the paper unveiled several novel legal concerns. Among these are the risk of unauthorized practice of law, the importance of maintaining client confidentiality, and the liability issues associated with potentially inaccurate advice. These insights, derived from practice-informed deliberations, highlight gaps in the current literature and demonstrate the benefits of using context-rich methods like case-based reasoning over more abstract or speculative approaches.

In summary, the paper underscores the value of interdisciplinary collaboration and detailed, context-specific analysis in cultivating responsible policies for LLM deployment in sensitive professional domains such as law. The four-dimension framework and the rich qualitative data gathered through expert workshops provide a robust foundation for guiding the ethical and effective use of LLMs in providing legal advice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. Cal. Bus. & Prof. Code § 6450, 2007.
  2. Legal information vs. legal advice, September 2015. URL https://www.txcourts.gov/media/1220087/legalinformationvslegaladviceguidelines.pdf.
  3. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90, 2023.
  4. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  5. A research agenda for hybrid intelligence: augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer, 53(8):18–28, 2020.
  6. Designing guiding principles for nlp for healthcare: A case study of maternal health, 2023.
  7. Semantic role labelling for dutch law texts. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 448–457, 2022.
  8. Artificial intelligence: real public engagement. RSA, London. Retrieved November, 5:2018, 2018.
  9. Exploring the potential utility of ai large language models for medical ethics: an expert panel evaluation of gpt-4. Journal of Medical Ethics, 2023.
  10. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445922. URL https://doi.org/10.1145/3442188.3445922.
  11. Power to the people? opportunities and challenges for participatory ai. Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–8, 2022.
  12. The nature of the judicial process. Quid Pro Books, 2010.
  13. Columbia Law School Writing Center. Organizing a legal discussion (irac, crac, etc.), 2001. URL https://www.law.columbia.edu/sites/default/files/2021-07/organizing_a_legal_discussion.pdf.
  14. Case law grounding: Aligning judgments of humans and ai on socially-constructed concepts. arXiv preprint arXiv:2310.07019, 2023.
  15. Sasha Costanza-Chock. Design justice: Towards an intersectional feminist framework for design theory and practice. Proceedings of the Design Research Society, 2018.
  16. Ethics in linguistics. Annual Review of Linguistics, 9:49–69, 2023.
  17. The participatory turn in ai design: Theoretical foundations and the current state of practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–23, 2023.
  18. Derek A Denckla. Nonlawyers and the unauthorized practice of law: an overview of the legal and ethical parameters. Fordham L. Rev., 67:2581, 1998.
  19. Queer people are people first: Deconstructing sexual identity stereotypes in large language models. arXiv preprint arXiv:2307.00101, 2023.
  20. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388, 2023.
  21. European Commission. Proposal for a regulation of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206, April 2021. COM(2021) 206 final 2021/0106(COD).
  22. Case repositories: Towards case-based reasoning for ai alignment, 2023.
  23. Robert K Fullinwider. Philosophy, casuistry, and moral development. Theory and Research in Education, 8(2):173–185, 2010.
  24. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022.
  25. Chatgpt perpetuates gender bias in machine translation and ignores non-gendered pronouns: Findings across bengali and five other low-resource languages. arXiv preprint arXiv:2305.10510, 2023.
  26. Bringing order into the realm of transformer-based language models for artificial intelligence and law. arXiv preprint arXiv:2308.05502, 2023.
  27. Thomas C Grey. Langdell’s orthodoxy. U. Pitt. L. Rev., 45:1, 1983.
  28. Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models, 2023.
  29. How close is chatgpt to human experts? comparison corpus, evaluation, and detection, 2023.
  30. Claudia E. Haupt. Artificial professional advice. Yale JL & Tech., 21:55, 2019.
  31. Lawyer llama technical report, 2023.
  32. Llm platform security: Applying a systematic evaluation framework to openai’s chatgpt plugins. arXiv preprint arXiv:2309.10254, 2023.
  33. Chatgpt for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3(1):100105, 2023.
  34. Legal syllogism prompting: Teaching large language models for legal judgment prediction. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pages 417–421, 2023.
  35. Albert R Jonsen. Casuistry and clinical ethics. Theoretical Medicine, 7:65–74, 1986.
  36. Humans, ai, and context: Understanding end-users’ trust in a real-world computer vision application. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 77–88, New York, NY, USA, 2023a. Association for Computing Machinery. ISBN 9798400701924. doi: 10.1145/3593013.3593978. URL https://doi.org/10.1145/3593013.3593978.
  37. Understanding users’ dissatisfaction with chatgpt responses: Types, resolving tactics, and the effect of knowledge level, 2023b.
  38. Janet L Kolodner. An introduction to case-based reasoning. Artificial intelligence review, 6(1):3–34, 1992.
  39. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pages 12–24, 2023.
  40. Fine-tuning gpt-3 for legal rule classification. Computer Law & Security Review, 51:105864, 2023.
  41. John Lightbourne. Algorithms & fiduciaries: existing and proposed regulatory approaches to artificially intelligent financial planners. Duke LJ, 67:651, 2017.
  42. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958, 2021.
  43. Beyond one-model-fits-all: A survey of domain specialization for large language models. arXiv preprint arXiv:2305.18703, 2023.
  44. Evaluating verifiability in generative search engines, 2023.
  45. Interpretable long-form legal question answering with retrieval-augmented large language models. arXiv preprint arXiv:2309.17050, 2023.
  46. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. In AMIA Annual Symposium Proceedings, volume 2023, page 1105. American Medical Informatics Association, 2023.
  47. John Leslie Mackie. Hume’s moral theory. Routledge, 2003.
  48. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for cscw and hci practice. Proc. ACM Hum.-Comput. Interact., 3(CSCW), nov 2019. doi: 10.1145/3359174. URL https://doi.org/10.1145/3359174.
  49. Katherine Medianik. Artificially intelligent lawyers: updating the model rules of professional conduct in accordance with the new technological era. Cardozo L. Rev., 39:1497, 2017.
  50. Rethinking search: Making domain experts out of dilettantes. In ACM SIGIR Forum, volume 55, pages 1–27, New York, NY, USA, 2021. ACM.
  51. Jeffrey Metzler. The importance of irac and legal writing. U. Det. Mercy L. Rev., 80:501, 2002.
  52. John J. Nay. Large language models as corporate lobbyists. arXiv preprint arXiv:2301.01181, 2023.
  53. Large language models as tax attorneys: A case study in legal capabilities emergence, 2023.
  54. Participatory research for low-resourced machine translation: A case study in african languages. arXiv preprint arXiv:2010.02353, 2020.
  55. Enhancing logical reasoning in large language models to facilitate legal applications, 2023.
  56. Man vs machine: how artificial intelligence in banking influences consumer belief in financial advice. International Journal of Bank Marketing, 40(6):1182–1199, 2022.
  57. Evaluating biased attitude associations of language models in an intersectional context. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 542–553, 2023.
  58. American Bar Association. Commission on Nonlawyer Practice. Nonlawyer activity in law-related situations: A report with recommendations. American Bar Association, 1995.
  59. Norbert Paulo. Casuistry as common law morality. Theoretical Medicine and Bioethics, 36(6):373–389, 2015.
  60. Credible without credit: Domain experts assess generative language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 427–438, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-short.37. URL https://aclanthology.org/2023.acl-short.37.
  61. Effect of hierarchical domain-specific language models and attention in the classification of decisions for legal cases. In Proceedings of the CIRCLE (Joint Conference of the Information Retrieval Communities in Europe), Samatan, Gers, France, pages 4–7, 2022.
  62. Queer in ai: A case study in community-led participatory ai. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 1882–1895, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701924. doi: 10.1145/3593013.3594134. URL https://doi.org/10.1145/3593013.3594134.
  63. Partha Pratim Ray. Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 2023.
  64. Mathew Rotenberg. Stifled justice: The unauthorized practice of law and internet legal resources. Minn. L. Rev., 97:709, 2012.
  65. Workshop on large language models’ interpretability and trustworthiness (llmit). In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 5290–5293, 2023.
  66. Malik Sallam. Chatgpt utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare, 11(6):887, 2023.
  67. Intelligent legal tech to empower self-represented litigants. Ohio State Legal Studies Research Paper, (688):23, 2022.
  68. Situating search. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval, pages 221–232, 2022.
  69. Role play with large language models. Nature, pages 1–6, 2023.
  70. In chatgpt we trust? measuring and characterizing the reliability of chatgpt, 2023.
  71. Artificial intelligence and machine learning based legal application: the state-of-the-art and future research trends. In 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), pages 57–62. IEEE, 2019.
  72. Drew Simshaw. Ethical issues in robo-lawyering: The need for guidance on developing and using artificial intelligence in the practice of law. Hastings LJ, 70:173, 2018.
  73. Augmenting interpretable models with large language models during training. Nature Communications, 14(1):7913, 2023.
  74. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023.
  75. Adam Smith. The theory of moral sentiments, ed. dd raphael and al macfie, 1976. VII, iv, 34.
  76. Artificial intelligence: Augmenting telehealth with large language models. Journal of telemedicine and telecare, page 1357633X231169055, 2023.
  77. Evaluating the social impact of generative ai systems in systems and society, 2023.
  78. Tania Sourdin. Judge v robot?: Artificial intelligence and judicial decision-making. University of New South Wales Law Journal, The, 41(4):1114–1133, 2018.
  79. Thomas E. Spahn. Is your artificial intelligence guilty of the unauthorized practice of law. Rich. JL & Tech., 24:1, 2017.
  80. Legal advice privilege and artificial legal intelligence: Can robots give privileged legal advice? The International Journal of Evidence & Proof, 23(4):422–439, 2019. doi: 10.1177/1365712719862296. URL https://journals.sagepub.com/doi/abs/10.1177/1365712719862296.
  81. Conversation to automation in banking through chatbot using artificial machine intelligence language. In 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), pages 611–618. IEEE, 2020.
  82. Cass R Sunstein. Legal reasoning and political conflict. Oxford University Press, 2018.
  83. Abductive analysis: Theorizing qualitative research. University of Chicago Press, 2014.
  84. Legal prompt engineering for multilingual legal judgement prediction. arXiv preprint arXiv:2212.02199, 2022.
  85. Large language models in cryptocurrency securities cases: Can chatgpt replace lawyers?, 2023.
  86. On the role of negative precedent in legal outcome prediction. Transactions of the Association for Computational Linguistics, 11:34–48, 2023.
  87. Clinical text summarization: Adapting large language models can outperform human experts, 2023.
  88. Benjamin Weiser. Chatgpt lawyers are ordered to consider seeking forgiveness, June 2023. URL https://www.nytimes.com/2023/06/22/nyregion/lawyers-chatgpt-schwartz-loduca.html.
  89. W. Bradley Wendel. The promise and limitations of artificial intelligence in the practice of law. Okla. L. Rev., 72:21, 2019.
  90. Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2:79–84, 2021.
  91. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. arXiv preprint arXiv:2312.02003, 2023.
  92. Disc-lawllm: Fine-tuning large language models for intelligent legal services, 2023.
  93. Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 2023.
  94. When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In Proceedings of the eighteenth international conference on artificial intelligence and law, pages 159–168, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Inyoung Cheong (7 papers)
  2. King Xia (2 papers)
  3. K. J. Kevin Feng (13 papers)
  4. Quan Ze Chen (13 papers)
  5. Amy X. Zhang (58 papers)
Citations (35)