Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 424 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Promises and pitfalls of artificial intelligence for legal applications (2402.01656v1)

Published 10 Jan 2024 in cs.CY and cs.AI

Abstract: Is AI set to redefine the legal profession? We argue that this claim is not supported by the current evidence. We dive into AI's increasingly prevalent roles in three types of legal tasks: information processing; tasks involving creativity, reasoning, or judgment; and predictions about the future. We find that the ease of evaluating legal applications varies greatly across legal tasks, based on the ease of identifying correct answers and the observability of information relevant to the task at hand. Tasks that would lead to the most significant changes to the legal profession are also the ones most prone to overoptimism about AI capabilities, as they are harder to evaluate. We make recommendations for better evaluation and deployment of AI in legal contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. ANALYSIS: DoNotPay Lawsuits: A Setback for Justice Initiatives?, a. URL https://news.bloomberglaw.com/bloomberg-law-analysis/analysis-donotpay-lawsuits-a-setback-for-justice-initiatives.
  2. DALL·E 3, b. URL https://openai.com/dall-e-3.
  3. DoNotPay - Your AI Consumer Champion, c. URL https://web.archive.org/web/20230730013643/https://donotpay.com/.
  4. Introducing Claude, d. URL https://www.anthropic.com/index/introducing-claude.
  5. Make-A-Video, e. URL https://makeavideo.studio/.
  6. DoNotPay - The World’s First Robot Lawyer, January 2023. URL https://web.archive.org/web/20230101170502/https://donotpay.com/.
  7. Exploring Perceptions and Experiences of ChatGPT in Medical Education: A Qualitative Study Among Medical College Faculty and Students in Saudi Arabia, July 2023. URL https://www.medrxiv.org/content/10.1101/2023.07.13.23292624v1. Pages: 2023.07.13.23292624.
  8. Machine Bias, 2016. URL https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
  9. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  10. The Foundation Model Transparency Index, October 2023. URL http://arxiv.org/abs/2310.12941. arXiv:2310.12941 [cs].
  11. What is the probability of receiving a us patent. Yale JL & Tech., 17:203, 2015.
  12. Ilias Chalkidis. ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark, March 2023. URL http://arxiv.org/abs/2304.12202. arXiv:2304.12202 [cs].
  13. How is ChatGPT’s behavior changing over time?, October 2023. URL http://arxiv.org/abs/2307.09009. arXiv:2307.09009 [cs].
  14. Report from the alternative pathway working group: Request to circulate for public comment. Board of Trustees Meeting Agenda Item, September 2023. URL https://www.courthousenews.com/wp-content/uploads/2023/09/california-bar-exam-alternative-proposal.pdf. Los Angeles Office, California State Bar.
  15. Ethan Corey. How a Tool to Help Judges May Be Leading Them Astray, August 2019. URL https://theappeal.org/how-a-tool-to-help-judges-may-be-leading-them-astray/.
  16. Andrew Deck. AI translation is jeopardizing Afghan asylum claims, April 2023. URL https://restofworld.org/2023/ai-translation-errors-afghan-refugees-asylum/.
  17. Typology of Legal Technologies: Cross-disciplinary Research in Computational Law (CRCL), November 2022. URL https://publications.cohubicol.com/typology/. Place: Brussels.
  18. The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1):eaao5580, January 2018. ISSN 2375-2548. doi: 10.1126/sciadv.aao5580. URL https://www.science.org/doi/10.1126/sciadv.aao5580.
  19. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, November 2020. ISSN 2522-5839. doi: 10.1038/s42256-020-00257-z. URL https://www.nature.com/articles/s42256-020-00257-z. Number: 11 Publisher: Nature Publishing Group.
  20. Artificial intelligence for adjudication: The social security administration and ai governance. 2021.
  21. Justice Needs and Satisfaction in the United States of America, 2021. URL https://iaals.du.edu/publications/justice-needs-and-satisfaction-united-states-america.
  22. Vulnerabilities in Discovery Tech. Harvard Journal of Law & Technology, 35, 2022. doi: 10.2139/ssrn.4065997. URL https://jolt.law.harvard.edu/assets/articlePDFs/v35/4.-Guha-Henderson-and-Zambrano-Vulnerabilities-in-Discovery-Tech.pdf.
  23. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models, August 2023. URL http://arxiv.org/abs/2308.11462. arXiv:2308.11462 [cs].
  24. Generative AI and Courts: How Are They Getting Along?, February 2022. URL https://www.jamsadr.com/blog/2023/francis-james-pli-generative-ai-1023.
  25. Horace He [@cHHillee]. I suspect GPT-4’s performance is influenced by data contamination, at least on Codeforces. Of the easiest problems on Codeforces, it solved 10/10 pre-2021 problems and 0/10 recent problems. This strongly points to contamination. 1/4 https://t.co/wm6yP6AmGx, March 2023. URL https://twitter.com/cHHillee/status/1635790330854526981.
  26. Setting the Record Straight: What the COMPAS Core Risk and Need Assessment Is and Is Not. Harvard Data Science Review, 2(1), January 2020. ISSN 2644-2353, 2688-8513. doi: 10.1162/99608f92.1b3dadaa. URL https://hdsr.mitpress.mit.edu/pub/hzwo7ax4/release/7.
  27. Joshua Browder [@jbrowder1]. DoNotPay will pay any lawyer or person $1,000,000 with an upcoming case in front of the United States Supreme Court to wear AirPods and let our robot lawyer argue the case by repeating exactly what it says. (1/2), January 2023a. URL https://twitter.com/jbrowder1/status/1612312707398795264.
  28. Joshua Browder [@jbrowder1]. Good morning! Bad news: after receiving threats from State Bar prosecutors, it seems likely they will put me in jail for 6 months if I follow through with bringing a robot lawyer into a physical courtroom. DoNotPay is postponing our court case and sticking to consumer rights:, January 2023b. URL https://twitter.com/jbrowder1/status/1618265395986857984.
  29. Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), September 2023. ISSN 2666-3899. doi: 10.1016/j.patter.2023.100804. URL https://www.cell.com/patterns/abstract/S2666-3899(23)00159-9. Publisher: Elsevier.
  30. Task Contamination: Language Models May Not Be Few-Shot Anymore, December 2023. URL https://arxiv.org/abs/2312.16337v1.
  31. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
  32. Data Contamination: From Memorization to Exploitation, March 2022. URL http://arxiv.org/abs/2203.08242. arXiv:2203.08242 [cs].
  33. John Markoff. Armies of Expensive Lawyers, Replaced by Cheaper Software. The New York Times, March 2011. ISSN 0362-4331. URL https://www.nytimes.com/2011/03/05/science/05legal.html.
  34. Eric Martínez. Re-Evaluating GPT-4’s Bar Exam Performance, May 2023. URL https://papers.ssrn.com/abstract=4441311.
  35. Legal Judgment Prediction: If You Are Going to Do It, Do It Right. In Daniel Preo\textcommabelowtiuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos (Jerry) Spanakis, and Nikolaos Aletras, editors, Proceedings of the Natural Legal Language Processing Workshop 2023, pages 73–84, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.nllp-1.9. URL https://aclanthology.org/2023.nllp-1.9.
  36. Rethinking the field of automatic prediction of court decisions. Artificial Intelligence and Law, 31(1):195–212, March 2023. ISSN 1572-8382. doi: 10.1007/s10506-021-09306-3. URL https://doi.org/10.1007/s10506-021-09306-3.
  37. Michael C. Dorf. Law-Specific Large Language Model Generative AI Interim Report: Lexis+AI Versus GPT-4, November 2023. URL https://www.dorfonlaw.org/2023/11/law-specific-large-language-model.html.
  38. Taylor Moore. Trade Secrets and Algorithms as Barriers to Social Justice, August 2017. URL https://cdt.org/insights/trade-secrets-and-algorithms-as-barriers-to-social-justice/.
  39. Generative AI companies must publish transparency reports, 2023a. URL http://knightcolumbia.org/blog/generative-ai-companies-must-publish-transparency-reports.
  40. Is GPT-4 getting worse over time?, July 2023b. URL https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-time.
  41. Comparison of History of Present Illness Summaries Generated by a Chatbot and Senior Internal Medicine Residents. JAMA Internal Medicine, 183(9):1026–1027, September 2023. ISSN 2168-6106. doi: 10.1001/jamainternmed.2023.2561. URL https://doi.org/10.1001/jamainternmed.2023.2561.
  42. Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654):187–192, July 2023. doi: 10.1126/science.adh2586. URL https://www.science.org/doi/10.1126/science.adh2586. Publisher: American Association for the Advancement of Science.
  43. OpenAI. GPT-4 Technical Report, March 2023. URL http://arxiv.org/abs/2303.08774. arXiv:2303.08774 [cs].
  44. Paris Martineau. Toronto Tapped Artificial Intelligence to Warn Swimmers. The Experiment Failed, 2022. URL https://www.theinformation.com/articles/when-artificial-intelligence-isnt-smarter.
  45. AI and the Everything in the Whole Wide World Benchmark. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 1, December 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/084b6fbb10729ed4da8c3d3f5a3ae7c9-Abstract-round2.html.
  46. Retrieval Augmentation Reduces Hallucination in Conversation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.320. URL https://aclanthology.org/2021.findings-emnlp.320.
  47. Karen Sloan. New bar exam gets lukewarm reception in previews, 2023. URL https://www.reuters.com/legal/legalindustry/new-bar-exam-gets-lukewarm-reception-previews-2023-07-19/. Accessed: 2023-11-08.
  48. Stanford Legal Design Lab and Suffolk LIT Lab. Learned Hands, 2018. URL https://learnedhands.law.stanford.edu.
  49. Statista Research Department. U.S.: number of lawyers 2007-2022, 2023. URL https://www.statista.com/statistics/740222/number-of-lawyers-us/.
  50. Reliance on metrics is a fundamental challenge for AI. Patterns, 3(5):100476, May 2022. ISSN 2666-3899. doi: 10.1016/j.patter.2022.100476. URL https://www.sciencedirect.com/science/article/pii/S2666389922000563.
  51. James Vincent. OpenAI isn’t doing enough to make ChatGPT’s limitations clear, May 2023. URL https://www.theverge.com/2023/5/30/23741996/openai-chatgpt-false-information-misinformation-responsibility.
  52. David Wagner. This Prolific LA Eviction Law Firm Was Caught Faking Cases In Court. Did They Misuse AI?, October 2023. URL https://laist.com/news/housing-homelessness/dennis-block-chatgpt-artificial-intelligence-ai-eviction-court-los-angeles-lawyer-sanction-housing-tenant-landlord. Section: Housing and Homelessness.
  53. Against Predictive Optimization: On the Legitimacy of Decision-Making Algorithms that Optimize Predictive Accuracy. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 626, New York, NY, USA, June 2023. Association for Computing Machinery. ISBN 9798400701924. doi: 10.1145/3593013.3594030. URL https://dl.acm.org/doi/10.1145/3593013.3594030.
  54. Benjamin Weiser. Here’s What Happens When Your Lawyer Uses ChatGPT. The New York Times, May 2023. ISSN 0362-4331. URL https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html.
  55. Reducing Quantity Hallucinations in Abstractive Summarization. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2237–2249, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.203. URL https://aclanthology.org/2020.findings-emnlp.203.
  56. LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset, September 2023. URL http://arxiv.org/abs/2309.11998. arXiv:2309.11998 [cs].
Citations (14)

Summary

  • The paper finds that AI boosts efficiency in information processing but falls short in tasks that require creative legal reasoning.
  • The paper highlights evaluation challenges such as data contamination and flawed benchmarks that limit AI's purported transformative impact.
  • The paper recommends involving legal experts and using naturalistic evaluations to ensure AI is deployed responsibly in legal contexts.

The paper "Promises and Pitfalls of Artificial Intelligence for Legal Applications," authored by Sayash Kapoor, Peter Henderson, and Arvind Narayanan from Princeton University, provides a rigorous analysis of AI's current capabilities and limitations in the legal sector. The authors scrutinize the claims that AI is set to transform the legal profession, evaluating its effectiveness across various legal tasks.

Main Findings

The work meticulously categorizes AI applications in the legal domain into three principal types: information processing, tasks involving creativity and judgment, and prediction of future outcomes. This classification provides a structured framework to assess AI's utility and potential pitfalls in legal contexts. The authors unequivocally argue that, despite the hype, AI's transformative impact on legal professions remains unsubstantiated by robust empirical evidence.

  1. Information Processing: The analysis finds that AI performs adequately in information-processing tasks, such as summarizing documents or categorizing legal requests, where the correct answers are usually well-defined and the relevant features are observable. The authors underscore that while AI may offer cost reductions and accuracy enhancements, these contributions are incremental, not transformative.
  2. Creativity, Reasoning, or Judgment: AI's role in tasks requiring creative legal reasoning or judgment, such as drafting legal filings or participating in automated mediation, is fraught with challenges. The paper identifies significant issues with benchmark-based evaluations, including data contamination and lack of construct validity. The authors argue that human benchmarks, like the bar exam, cannot reliably gauge AI's capability for complex legal reasoning.
  3. Predicting Legal Outcomes and Decisions: The paper critically examines AI applications claiming to predict court decisions and criminal justice outcomes. It highlights substantial flaws in prediction models, primarily due to insufficient observability of case-specific information and context. The limitations of predictive models, such as distribution shift and low accuracy, demonstrate that such applications are premature and potentially harmful.

The authors advocate for more nuanced and context-sensitive evaluations of AI systems in legal applications. They emphasize the need for evaluations that incorporate both quantitative metrics and qualitative insights from legal professionals. Moreover, the paper suggests that AI should be confined to narrow tasks where it can be reliably evaluated, such as identifying errors in legal documents, rather than being used for consequential decisions like predicting court outcomes or recidivism risks.

Recommendations

The paper provides valuable guidance for addressing evaluation challenges:

  • Construct Validity and Expert Involvement: Legal experts should be integral to designing evaluation benchmarks to ensure they reflect real-world tasks and improve construct validity.
  • Naturalistic Evaluations: Researchers should employ evaluations that simulate real-world use closely, promoting a practical understanding of AI's effectiveness.
  • Transparency and Communication: Developers should transparently communicate AI limitations to end users, particularly to prevent reliance on AI-generated outputs in critical legal settings.

Conclusion

By dissecting AI's capabilities and limitations in the legal field, the paper makes a fundamental scholarly contribution to understanding the promises and pitfalls of AI deployment in legal contexts. While AI offers potential efficiencies, the authors astutely caution against uncritical reliance and advocate for the need for meticulous evaluations to inform responsible AI deployment. The framework outlined in the paper serves as a crucial guide for future developments and empirical research, ensuring AI's integration into the legal domain is judicious and evidence-based.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 152 likes.

Upgrade to Pro to view all of the tweets about this paper: