Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Centered AI for Indian Legal Text Analytics (2403.10944v1)

Published 16 Mar 2024 in cs.HC and cs.AI

Abstract: Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training LLMs. This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Bilal Abu-Salih. Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications, 185:103076, 2021.
  2. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training, 2020.
  3. There is no big brother or small brother: Knowledge infusion in language models for link prediction and question answering. arXiv preprint arXiv:2301.04013, 2023.
  4. Claire Barale. Human-centered computing in legal nlp-an application to refugee status determination. In Proceedings of the Second Workshop on Bridging Human–Computer Interaction and Natural Language Processing, 2022.
  5. Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the eighteenth international conference on artificial intelligence and law, 2021.
  6. Citation-based summarization of landmark judgments. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), 2023.
  7. LEGAL-BERT: The muppets straight out of law school. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2898–2904, Online, November 2020. Association for Computational Linguistics.
  8. Harrison Chase. LangChain, October 2022.
  9. SystemT: An algebraic approach to declarative information extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 128–137, 2010.
  10. Legal summarisation through llms: The prodigit project. arXiv e-prints, pages arXiv–2308, 2023.
  11. Similar cases recommendation using legal knowledge graphs, 2021.
  12. Mixture-of-domain-adapters: Decoupling and injecting domain knowledge to pre-trained language models memories. arXiv preprint arXiv:2306.05406, 2023.
  13. Knowledge prompts: Injecting world knowledge into language models through soft prompts, 2022.
  14. JuriBERT: A masked-language model adaptation for French legal text. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 95–101, 2021.
  15. Letsum, a text summarization system in law field. In a THE FACE OF TEXT conference (Computer Assisted Text Analysis in the Humanities), pages 27–36, 2004.
  16. Diego de Vargas Feijo and Viviane P Moreira. Improving abstractive summarization of legal rulings through textual entailment. Artificial intelligence and law, 31(1), 2023.
  17. Anu question answering system. In ISWC 2020 Posters Demos and Industry Tracks, volume 2721, pages 394–396, 2020.
  18. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 44123–44279, 2023.
  19. Fair data generation using language models with hard constraints. CtrlGen Workshop, 2021.
  20. Constructing a Knowledge Graph from Indian Legal Domain Corpus. In TEXT2KG @ Extended Semantic Web Conference (ESWC 2022), CEUR Workshop Proceedings, volume 3184, pages 80–93, 2022.
  21. Overview of the tac 2010 knowledge base population track. In Third text analysis conference (TAC 2010), volume 3, pages 3–3, 2010.
  22. Alda: Cognitive assistant for legal document analytics, 2016.
  23. Corpus for Automatic Structuring of Legal Documents. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4420–4429, 2022.
  24. Text summarization from legal documents: a survey. Artificial Intelligence Review, 51(3):371–402, 2019.
  25. CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL. arXiv preprint arXiv:2311.01173, 2023.
  26. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. arXiv preprint arXiv:2305.03111, 2023.
  27. ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pages 4046–4062, 2021.
  28. Skill: Structured knowledge infusion for large language models. arXiv preprint arXiv:2205.08184, 2022.
  29. Pre-training transformers on indian legal text. arXiv preprint arXiv:2209.06049, 2022.
  30. Casesummarizer: a system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations, pages 258–262, 2016.
  31. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. arXiv preprint arXiv:2304.11015, 2023.
  32. Uniqorn: unified question answering over rdf knowledge graphs and natural language text. arXiv preprint arXiv:2108.08614, 2021.
  33. Stanza: A python natural language processing toolkit for many human languages. In Association for Computational Linguistics (ACL) System Demonstrations. 2020., 2020.
  34. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607, 2018.
  35. Ben Shneiderman. Human-centered AI. Oxford University Press, 2022.
  36. Legal case document summarization: Extractive and abstractive methods and their evaluation. In The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022.
  37. Sushant Sinha. IndianKanoon: Search Engine for Indian Law, 2008.
  38. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288, 2023.
  39. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions, 2023.
  40. Salomon: automatic abstracting of legal cases for effective access to court decisions. Artificial Intelligence and Law, 6(1):59–79, 1998.
  41. Data augmentation for fairness in personal knowledge base population. In Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2021 Workshops, 2021 Proceedings 25, volume 12705, pages 143–152, New Delhi, India, 2021.
  42. Infusing knowledge into large language models with contextual prompts, 2023.
  43. ConcEPT: Concept-Enhanced Pre-Training for Language Models. arXiv preprint arXiv:2401.05669, 2024.
  44. Knowledge enhanced pretrained language models: A compreshensive survey. arXiv preprint arXiv:2110.08455, 2021.
  45. Aniruddha Yadav. Casemine: A granular mapping of indian case law, 2013.
  46. A survey of knowledge enhanced pre-trained models. arXiv preprint arXiv:2110.00269, 2021.
  47. Automatic abstracting civil judgment documents with two-stage procedure. Data Analysis and Knowledge Discovery, 5(5):104–114, 2021.
  48. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, 2018.
  49. The shift from models to compound ai systems. https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/, 2024.
  50. When does pretraining help? assessing self-supervised learning for law and the casehold dataset. In Proceedings of the 18th International Conference on Artificial Intelligence and Law. Association for Computing Machinery, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Sudipto Ghosh (4 papers)
  2. Devanshu Verma (2 papers)
  3. Balaji Ganesan (17 papers)
  4. Purnima Bindal (2 papers)
  5. Vikas Kumar (43 papers)
  6. Vasudha Bhatnagar (14 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets