Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law (2405.01769v2)

Published 2 May 2024 in cs.CL
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

Abstract: In the fast-evolving domain of artificial intelligence, LLMs such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{https://github.com/czyssrs/LLM_X_papers}.

Overview of Ethical Considerations for LLMs in High-Stakes Sectors

Introduction to LLMs in Critical Sectors

LLMs such as GPT-4 are playing increasingly significant roles across various high-stakes sectors, including finance, healthcare, and law. These sectors are particularly sensitive due to their substantial impact on individual and societal well-being, making the deployment of LLMs in these fields both promising and challenging.

Due to the complexity and the high-stakes nature of tasks in these domains, ensuring the ethical application of LLMs is paramount. This discussion explores some of the primary ethical concerns and considerations in deploying LLMs across these sectors, as well as future directions to address these challenges.

Key Ethical Challenges Across Sectors

The use of LLMs in sectors such as finance, healthcare, and law introduces various ethical challenges that need careful consideration:

  • Data Sensitivity and Confidentiality: These domains often involve handling sensitive and confidential information, raising significant concerns about privacy and data protection.
  • Need for Explainability: Decisions in these fields can have life-altering consequences. Therefore, the ability of LLMs to provide explainable and interpretable outputs is crucial.
  • Risk of Bias and Fairness: Ensuring that LLMs do not perpetuate or amplify existing biases is critical, especially in decision-making processes that affect human rights and access to resources.
  • Compliance and Regulation: Each sector faces strict regulatory requirements that LLMs must adhere to, complicating their deployment.

Specialized Domain Challenges

Each high-stakes domain presents unique challenges:

  • Finance: LLMs are used for tasks like risk assessment and fraud detection, where accuracy and reliability are crucial to avoid financial mishaps and maintain trust.
  • Healthcare: In healthcare, LLMs assist with diagnosis and treatment recommendations. Mistakes or inaccuracies can directly endanger lives, highlighting the need for extremely reliable and precise systems.
  • Law: Legal applications involve analyzing legal texts and aiding in case predictions. Ethical concerns here include the need for fairness, unbiased support, and adherence to the latest laws and regulations.

Addressing Ethical Concerns

The development and deployment of LLMs in these sectors require a comprehensive strategy addressing various ethical concerns:

  • Enhancing Data Privacy: Implementing advanced encryption and anonymization techniques to protect sensitive information processed by LLMs.
  • Improving Explainability: Developing methods to make LLM decisions more transparent and understandable to users, enabling them to trust and verify the outputs provided by these models.
  • Mitigating Bias: Employing techniques like dataset balancing and bias audit trails to ensure LLMs operate fairly across all demographics.
  • Ensuring Compliance: Integrating regulatory compliance checks into the LLM training and deployment processes to align with sector-specific legal standards.

Future Directions

Looking forward, several key areas could further enhance the ethical deployment of LLMs in high-stakes domains:

  • Robust Testing Frameworks: Developing comprehensive testing and validation frameworks to evaluate the ethical implications of LLMs before full-scale deployment.
  • Cross-disciplinary Collaboration: Fostering cooperation between AI developers, domain experts, and ethicists to ensure that LLMs are developed with a thorough understanding of domain-specific needs and ethical requirements.
  • Continuous Monitoring and Updating: Establishing systems for the ongoing monitoring of deployed LLMs to quickly identify and rectify any emerging ethical issues or non-compliance.

Conclusion

The integration of LLMs into finance, healthcare, and law holds remarkable potential but comes with significant ethical responsibilities. By addressing these ethical concerns proactively and comprehensively, we can harness the power of LLMs to benefit these critical sectors while safeguarding the interests and rights of individuals and society as a whole.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (501)
  1. Law stack exchange. URL https://law.stackexchange.com. Accessed on 30 April 2024.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  3. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138–52160, 2018.
  4. Large language models are few-shot clinical information extractors. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  1998–2022, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.130. URL https://aclanthology.org/2022.emnlp-main.130.
  5. Large language models as financial data annotators: A study on effectiveness and efficiency, 2024.
  6. Policyqa: A reading comprehension dataset for privacy policies. arXiv preprint arXiv:2010.02557, 2020.
  7. Mitigating language-dependent ethnic bias in bert. arXiv preprint arXiv:2109.05704, 2021.
  8. The application of artificial intelligence in financial compliance management. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, pp.  1–6, 2019.
  9. General purpose large language models match human performance on gastroenterology board exam self-assessments. In medRxiv, 2023. URL https://api.semanticscholar.org/CorpusID:262323763.
  10. Artificial hallucinations in chatgpt: implications in scientific writing. Cureus, 15(2), 2023.
  11. Impact of adversarial training on robustness and generalizability of language models. arXiv preprint arXiv:2211.05523, 2022.
  12. Domain adaption of named entity recognition to support credit risk assessment. In Ben Hachey and Kellie Webster (eds.), Proceedings of the Australasian Language Technology Association Workshop, ALTA 2015, Parramatta, Australia, December 8 - 9, 2015, pp.  84–90. ACL, 2015. URL https://aclanthology.org/U15-1010/.
  13. Exploiting biased models to de-bias text: A gender-fair rewriting model. arXiv preprint arXiv:2305.11140, 2023.
  14. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  15. Dogu Araci. Finbert: Financial sentiment analysis with pre-trained language models. CoRR, abs/1908.10063, 2019. URL http://arxiv.org/abs/1908.10063.
  16. Expert finding in legal community question answering. In European Conference on Information Retrieval, pp.  22–30. Springer, 2022.
  17. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pp.  1–8, 2020.
  18. The ethics of ai business practices: a review of 47 ai ethics guidelines. AI and Ethics, 3(2):389–406, 2023.
  19. Machine learning and AI for risk management. Springer International Publishing, 2019.
  20. Question analysis for vietnamese legal question answering. In 2017 9th International Conference on Knowledge and Systems Engineering (KSE), pp.  154–159. IEEE, 2017.
  21. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond. arXiv preprint arXiv:2308.12966, 2023.
  22. Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15016–15027, 2023.
  23. Principles of biomedical ethics. Oxford University Press, USA, 2001.
  24. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
  25. Simon Benninga. Financial modeling. MIT press, 2014.
  26. Artificial intelligence in financial services: a qualitative research to discover robo-advisory services. Qualitative Research in Financial Markets, 13(5):632–654, 2021.
  27. Fintral: A family of GPT-4 level multimodal financial large language models. CoRR, abs/2402.10986, 2024. doi: 10.48550/ARXIV.2402.10986. URL https://doi.org/10.48550/arXiv.2402.10986.
  28. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, pp.  387–402. Springer, 2013.
  29. Biomedlm: A 2.7 b parameter language model trained on biomedical text. arXiv preprint arXiv:2403.18421, 2024.
  30. Gpt takes the bar exam. arXiv preprint arXiv:2212.14402, 2022.
  31. Fraud detection with natural language processing. Machine Learning, pp.  1–22, 2023.
  32. The integrated curriculum in medical education: Amee guide no. 96. Medical teacher, 37(4):312–322, 2015.
  33. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  34. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70:245–317, 2021.
  35. Ethical dilemmas, mental health, artificial intelligence, and llm-based chatbots. In International Work-Conference on Bioinformatics and Biomedical Engineering, pp.  313–326. Springer, 2023.
  36. A controlled natural language for tax fraud detection. In Controlled Natural Language: 5th International Workshop, CNL 2016, Aberdeen, UK, July 25-27, 2016, Proceedings 5, pp.  1–12. Springer, 2016.
  37. Can gpt models be financial analysts? an evaluation of chatgpt and gpt-4 on mock cfa exams, 2023.
  38. Longbing Cao. Ai in finance: challenges, techniques, and opportunities. ACM Computing Surveys (CSUR), 55(3):1–38, 2022.
  39. Yi Cao and Jia Zhai. A survey of ai in finance. Journal of Chinese Economic and Business Studies, 20(2):125–137, 2022.
  40. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.  2633–2650, 2021.
  41. Fairness in machine learning: A survey. ACM Computing Surveys, 2020.
  42. Ilias Chalkidis. Chatgpt may pass the bar exam soon, but has a long way to go for the lexglue benchmark. arXiv preprint arXiv:2304.12202, 2023.
  43. Deep learning in law: early adaptation and legal word embeddings trained on large corpora. Artificial Intelligence and Law, 27(2):171–198, 2019.
  44. Neural legal judgment prediction in english. arXiv preprint arXiv:1906.02059, 2019.
  45. Legal-bert: The muppets straight out of law school. arXiv preprint arXiv:2010.02559, 2020.
  46. Multieurlex–a multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. arXiv preprint arXiv:2109.00904, 2021a.
  47. Paragraph-level rationale extraction through regularization: A case study on european court of human rights cases. arXiv preprint arXiv:2103.13084, 2021b.
  48. Lexglue: A benchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976, 2021c.
  49. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters, 2024.
  50. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 2023.
  51. A review of natural language processing in medical education. Western Journal of Emergency Medicine, 20(1):78, 2019.
  52. Erwin Chemerinsky. Constitutional law. Aspen Publishing, 2023.
  53. Equals: A real-world dataset for legal question answering via reading chinese laws. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pp.  71–80, 2023a.
  54. Data-driven detection of subtype-specific differentially expressed genes. Scientific reports, 11(1):332, 2021a.
  55. Uncertainty quantification on clinical trial outcome prediction. arXiv preprint arXiv:2401.03482, 2024.
  56. Disc-finllm: A chinese financial large language model based on multiple experts fine-tuning. CoRR, abs/2310.15205, 2023b. doi: 10.48550/ARXIV.2310.15205. URL https://doi.org/10.48550/arXiv.2310.15205.
  57. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. Transactions on Machine Learning Research, 2023c.
  58. Enhancement of fraud detection for narratives in annual reports. International Journal of Accounting Information Systems, 26:32–45, 2017.
  59. Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056, 2020.
  60. Finqa: A dataset of numerical reasoning over financial data. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp.  3697–3711. Association for Computational Linguistics, 2021b. doi: 10.18653/V1/2021.EMNLP-MAIN.300. URL https://doi.org/10.18653/v1/2021.emnlp-main.300.
  61. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp.  6279–6292. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.EMNLP-MAIN.421. URL https://doi.org/10.18653/v1/2022.emnlp-main.421.
  62. Combating emerging financial risks in the big data era: A perspective review. Fundamental Research, 1(5):595–606, 2021.
  63. (a) i am not a lawyer, but…: Engaging legal experts towards responsible llm policies for legal advice. arXiv preprint arXiv:2402.01864, 2024.
  64. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  65. Jonathan H Choi. How to use large language models for empirical legal research. Journal of Institutional and Theoretical Economics (Forthcoming), 2023.
  66. Chatgpt goes to law school. J. Legal Educ., 71:387, 2021.
  67. Palm: Scaling language modeling with pathways, 2022.
  68. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=r1xMH1BtvB.
  69. Saullm-7b: A pioneering large language model for law. arXiv preprint arXiv:2403.03883, 2024.
  70. CAIL AI Legal Competition. Cail ai legal competition. https://github.com/china-ai-law-challenge/CAIL2022, 2022.
  71. The Atticus Project Contributors. The Atticus Project: Open-source tools for forensic analysis, 2024. URL https://github.com/TheAtticusProject. GitHub repository.
  72. Deep learning for detecting financial statement fraud. Decision Support Systems, 139:113421, 2020.
  73. Ross Cranston. Legal ethics and professional responsibility. 1995.
  74. Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092, 2023a.
  75. A survey on legal judgment prediction: Datasets, metrics, models and challenges. IEEE Access, 2023b.
  76. Yiming Cui and et al. Chinese-llama-alpaca-2: A chinese large language model, 2023. URL https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/.
  77. Elastic weight removal for faithful and abstractive dialogue generation. arXiv preprint arXiv:2303.17574, 2023.
  78. Large legal fictions: Profiling legal hallucinations in large language models. arXiv preprint arXiv:2401.01301, 2024.
  79. Laiw: A chinese legal large language models benchmark (a technical report). arXiv preprint arXiv:2310.05620, 2023.
  80. Artificial intelligence for conversational robo-advisor. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp.  1057–1064. IEEE, 2018.
  81. Marleen De Bruijne. Machine learning approaches in medical image analysis: From detection to diagnosis, 2016.
  82. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2016.
  83. Jailbreaker: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715, 2023a.
  84. What do llms know about financial markets? a case study on reddit market sentiment analysis. In Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, pp.  107–110, New York, NY, USA, 2023b. Association for Computing Machinery. ISBN 9781450394192. doi: 10.1145/3543873.3587324. URL https://doi.org/10.1145/3543873.3587324.
  85. PACIFIC: towards proactive conversational question answering over tabular and textual data in finance. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp.  6970–6984. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.EMNLP-MAIN.469. URL https://doi.org/10.18653/v1/2022.emnlp-main.469.
  86. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the sixth workshop on statistical machine translation, pp.  85–91, 2011.
  87. Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. arXiv preprint arXiv:1710.06071, 2017.
  88. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019., pp.  4171–4186. Association for Computational Linguistics, 2019.
  89. Queer people are people first: Deconstructing sexual identity stereotypes in large language models. arXiv preprint arXiv:2307.00101, 2023.
  90. State of the art in artificial intelligence applied to the legal domain. arXiv preprint arXiv:2204.07047, 2022.
  91. Using structured events to predict stock price movement: An empirical investigation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans (eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp.  1415–1425. ACL, 2014. doi: 10.3115/V1/D14-1148. URL https://doi.org/10.3115/v1/d14-1148.
  92. Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering, 2023.
  93. Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO), pp.  0210–0215. IEEE, 2018.
  94. Cjrc: A reliable human-annotated benchmark dataset for chinese judicial reading comprehension. In Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18, pp.  439–451. Springer, 2019.
  95. Alpacafarm: A simulation framework for methods that learn from human feedback, 2023.
  96. Ronald Dworkin. Law’s empire. Harvard University Press, 1986.
  97. Neural path hunter: Reducing hallucination in dialogue systems via path grounding. arXiv preprint arXiv:2104.08455, 2021.
  98. Fiscal policy and economic growth. Journal of monetary economics, 32(3):417–458, 1993.
  99. Driving through the concept gridlock: Unraveling explainability bottlenecks in automated driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  7346–7355, 2024.
  100. Medical deep learning—a systematic meta-review. Computer methods and programs in biomedicine, 221:106874, 2022.
  101. Cases and materials on torts. Aspen Publishing, 2020.
  102. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.
  103. Large language models for software engineering: Survey and open problems. arXiv preprint arXiv:2310.03533, 2023.
  104. Super-scotus: A multi-sourced dataset for the supreme court of the us. In Proceedings of the Natural Legal Language Processing Workshop 2023, pp.  202–214, 2023.
  105. Ethical considerations and policy interventions concerning the impact of generative ai tools in the economy and in society. AI and Ethics, pp.  1–9, 2024.
  106. A semi-automated ontology construction for legal question answering. New Generation Computing, 37(4):453–478, 2019.
  107. Lawbench: Benchmarking legal knowledge of large language models. arXiv preprint arXiv:2309.16289, 2023.
  108. Emilio Ferrara. Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738, 2023.
  109. Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems in Accounting, Finance and Management, 23(3):157–214, 2016.
  110. Philip Hans Franses. Time series models for business and economic forecasting. Cambridge university press, 1998.
  111. Financial statement analysis: a practitioner’s guide. John Wiley & Sons, 2022.
  112. Lawrence M Friedman. A history of American law. Simon and Schuster, 2005.
  113. Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770, 2023.
  114. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022.
  115. The pile: An 800gb dataset of diverse text for language modeling. CoRR, abs/2101.00027, 2021a. URL https://arxiv.org/abs/2101.00027.
  116. A review of natural language processing for financial technology. In International Symposium on Artificial Intelligence and Robotics 2021, volume 11884, pp.  262–277. SPIE, 2021b.
  117. Ophglm: Training an ophthalmology large language-and-vision assistant based on instructions and dialogue. arXiv preprint arXiv:2306.12174, 2023a.
  118. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023b.
  119. Role of machine learning in medical research: A survey. Computer science review, 40:100370, 2021.
  120. Processing and visualizing three-dimensional ultrasound data. The British journal of radiology, 77(suppl_2):S186–S193, 2004.
  121. Frank W Geels. The impact of the financial–economic crisis on sustainability transitions: Financial investment, governance and public discourse. Environmental Innovation and Societal Transitions, 6:67–95, 2013.
  122. Chatgpt perpetuates gender bias in machine translation and ignores non-gendered pronouns: Findings across bengali and five other low-resource languages. arXiv preprint arXiv:2305.10510, 2023.
  123. Large language model ai chatbots require approval as medical devices. Nature Medicine, 29(10):2396–2398, 2023.
  124. Paolo Giudici. Fintech risk management: A research challenge for artificial intelligence in finance. Frontiers in Artificial Intelligence, 1:1, 2018.
  125. Do sentiments matter in fraud detection? estimating semantic orientation of annual reports. Intelligent Systems in Accounting, Finance and Management, 23(3):215–239, 2016.
  126. A core physical exam for medical students: results of a national survey. Academic Medicine, 89(3):436–442, 2014.
  127. Tsinghua University Data Mining Group. Chatglm-6b: A large-scale chinese generative language model, 2023. URL https://github.com/THUDM/ChatGLM-6B/tree/main.
  128. Large language models are zero-shot time series forecasters. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=md68e8iZK1.
  129. Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. arXiv preprint arXiv:2308.11462, 2023.
  130. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1012–1023, 2022.
  131. Is chatgpt a financial expert? evaluating language models on financial natural language processing. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pp.  815–821. Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.findings-emnlp.58.
  132. Comprehensive review of text-mining applications in finance. Financial Innovation, 6:1–25, 2020.
  133. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020.
  134. Regulating chatgpt and other large generative ai models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp.  1112–1123, 2023.
  135. C. Haitao. LexiLaw: A Legal Text Processing Toolkit. https://github.com/CSHaitao/LexiLaw, 2024. Accessed: 2024-04-29.
  136. The ethics of chatgpt in medicine and healthcare: A systematic review on large language models (llms). arXiv preprint arXiv:2403.14473, 2024.
  137. Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
  138. Balancing out bias: Achieving fairness through balanced training. arXiv preprint arXiv:2109.08253, 2021.
  139. Herbert Lionel Adolphus Hart and Leslie Green. The concept of law. oxford university press, 2012.
  140. Vision-language models for medical report generation and visual question answering: A review. arXiv preprint arXiv:2403.02469, 2024.
  141. Harvard Law School Library. Caselaw access project. https://case.law/, 2023. Free, public access to over 6.5 million decisions published by state and federal courts throughout U.S. history.
  142. Addressing data bias problems for chest x-ray image report generation. arXiv preprint arXiv:1908.02123, 2019.
  143. Wanjuan: A comprehensive multimodal dataset for advancing english and chinese large models. arXiv preprint arXiv:2308.10755, 2023a.
  144. Mabel: Attenuating gender bias using textual entailment data. arXiv preprint arXiv:2210.14975, 2022.
  145. Hanfei-1.0. https://github.com/siat-nlp/HanFei, 2023b.
  146. Medeval: A multi-level, multi-task, and multi-domain medical benchmark for language model evaluation. arXiv preprint arXiv:2310.14088, 2023c.
  147. “nothing abnormal”: Disambiguating medical reports via contrastive knowledge infusion. arXiv preprint arXiv:2305.08300, 2023d.
  148. Llm self defense: By self examination, llms know they are being tricked. arXiv preprint arXiv:2308.07308, 2023.
  149. Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021a. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
  150. Cuad: An expert-annotated nlp dataset for legal contract review. arXiv preprint arXiv:2103.06268, 2021b.
  151. Knowledge graphs. ACM Computing Surveys (Csur), 54(4):1–37, 2021.
  152. A dataset for statutory reasoning in tax law entailment and question answering. arXiv preprint arXiv:2005.05257, 2020.
  153. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  154. No language is an island: Unifying chinese and english in financial large language models, instruction data, and benchmarks, 2024.
  155. Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
  156. ClinicalBERT: Modeling clinical notes and predicting hospital readmission. CHIL Workshop, 2019.
  157. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023.
  158. Aila: A question answering system in the legal domain. In IJCAI, pp.  5258–5260, 2020.
  159. Research on generative artificial intelligence for virtual financial robo-advisor. Academic Journal of Science and Technology, 10(1):74–80, 2024.
  160. Toward a chatbot for financial sustainability. Sustainability, 13(6):3173, 2021.
  161. IDEA-CCNL. Fengshenbang-lm: A large-scale generative language model for chinese, 2023. URL https://github.com/IDEA-CCNL/Fengshenbang-LM.
  162. Baichuan Inc. Baichuan-7b: A large-scale chinese generative language model, 2023. URL https://github.com/baichuan-inc/Baichuan-7B.
  163. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  590–597, 2019.
  164. Financebench: A new benchmark for financial question answering. CoRR, abs/2311.11944, 2023. doi: 10.48550/ARXIV.2311.11944. URL https://doi.org/10.48550/arXiv.2311.11944.
  165. Kwan Yuen Iu and Vanessa Man-Yi Wong. Chatgpt by openai: The end of litigation lawyers? Available at SSRN 4339839, 2023.
  166. Baseline defenses for adversarial attacks against aligned language models. arXiv preprint arXiv:2309.00614, 2023.
  167. Radgraph: Extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463, 2021.
  168. Beavertails: Towards improved safety alignment of llm via a human-preference dataset. Advances in Neural Information Processing Systems, 36, 2024.
  169. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  170. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  171. Legal syllogism prompting: Teaching large language models for legal judgment prediction. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pp.  417–421, 2023.
  172. Leveraging large language models for learning complex legal concepts through storytelling. arXiv preprint arXiv:2402.17019, 2024.
  173. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. CoRR, abs/2009.13081, 2020. URL https://arxiv.org/abs/2009.13081.
  174. Time-LLM: Time series forecasting by reprogramming large language models. In International Conference on Learning Representations (ICLR), 2024.
  175. Pubmedqa: A dataset for biomedical research question answering. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp.  2567–2577. Association for Computational Linguistics, 2019. doi: 10.18653/V1/D19-1259. URL https://doi.org/10.18653/v1/D19-1259.
  176. The global landscape of ai ethics guidelines. Nature machine intelligence, 1(9):389–399, 2019.
  177. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
  178. Juru: Legal brazilian large language model from reputable sources. arXiv preprint arXiv:2403.18140, 2024.
  179. Challenges and applications of large language models. arXiv preprint arXiv:2307.10169, 2023.
  180. Secnlp: A survey of embeddings in clinical natural language processing. Journal of biomedical informatics, 101:103323, 2020.
  181. Deficiency of large language models in finance: An empirical examination of hallucination. In I Can’t Believe It’s Not Better Workshop: Failure Modes in the Age of Foundation Models, 2024. URL https://openreview.net/forum?id=SGiQxu8zFL.
  182. Coliee-2018: Evaluation of the competition on legal information extraction and entailment. In New Frontiers in Artificial Intelligence: JSAI-isAI 2018 Workshops, JURISIN, AI-Biz, SKL, LENLS, IDAA, Yokohama, Japan, November 12–14, 2018, Revised Selected Papers, pp.  177–192. Springer, 2019.
  183. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  184. Embracing large language models for medical applications: opportunities and challenges. Cureus, 15(5), 2023.
  185. Copyright violations and large language models. arXiv preprint arXiv:2310.13771, 2023.
  186. Natural language processing in the legal domain. arXiv preprint arXiv:2302.12039, 2023.
  187. Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270):20230254, 2024.
  188. Trustworthy artificial intelligence: a review. ACM computing surveys (CSUR), 55(2):1–38, 2022.
  189. Refind: Relation extraction financial dataset. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete (eds.), Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pp.  3054–3063. ACM, 2023. doi: 10.1145/3539618.3591911. URL https://doi.org/10.1145/3539618.3591911.
  190. Ethics of ai: A systematic literature review of principles and challenges. In Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering, pp.  383–392, 2022.
  191. Chatgpt in finance: Applications, challenges, and solutions. Heliyon, 10(2), 2024.
  192. Coliee-2015: evaluation of legal question answering. In Ninth International Workshop on Juris-informatics (JURISIN 2015), 2015.
  193. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
  194. Understanding black-box predictions via influence functions. In International conference on machine learning, 2017.
  195. Concept bottleneck models. In International conference on machine learning, pp.  5338–5348. PMLR, 2020.
  196. Bizbench: A quantitative reasoning benchmark for business and finance. CoRR, abs/2311.06602, 2023. doi: 10.48550/ARXIV.2311.06602. URL https://doi.org/10.48550/arXiv.2311.06602.
  197. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pp.  12–24, 2023.
  198. Taku Kudo. Subword regularization: Improving neural network translation models with multiple subword candidates. In Iryna Gurevych and Yusuke Miyao (eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  66–75, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1007. URL https://aclanthology.org/P18-1007.
  199. Certifying llm safety against adversarial prompting. arXiv preprint arXiv:2309.02705, 2023.
  200. A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114:128–147, 2016.
  201. Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models. PLoS digital health, 2(2):e0000198, 2023.
  202. Chain of reference prompting helps llm to think like a lawyer. In Generative AI+ Law Workshop, 2023.
  203. On the current and emerging challenges of developing fair and ethical ai solutions in financial services. In Proceedings of the second ACM international conference on AI in finance, pp.  1–8, 2021.
  204. Large language models are clinical reasoners: Reasoning-aware diagnosis framework with prompt-generated rationales, 2024.
  205. Open sesame! universal black box jailbreaking of large language models. arXiv preprint arXiv:2309.01446, 2023.
  206. Sustainable modular debiasing of language models. arXiv preprint arXiv:2109.03646, 2021.
  207. A survey of large language models in finance (finllms). arXiv preprint arXiv:2402.02315, 2024.
  208. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
  209. Factuality enhanced language models for open-ended text generation. Advances in Neural Information Processing Systems, 35:34586–34599, 2022.
  210. David Leslie. Understanding artificial intelligence ethics and safety. arXiv preprint arXiv:1906.05684, 2019.
  211. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html.
  212. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems, 36, 2024a.
  213. Camel: Communicative agents for "mind" exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  214. Ethics of large language models in medicine and medical research. The Lancet Digital Health, 5(6):e333–e335, 2023b.
  215. Finmem: A performance-enhanced LLM trading agent with layered memory and character design. In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024b. URL https://openreview.net/forum?id=sstfVOwbiG.
  216. CFGPT: chinese financial assistant with large language model. CoRR, abs/2309.10654, 2023c. doi: 10.48550/ARXIV.2309.10654. URL https://doi.org/10.48550/arXiv.2309.10654.
  217. Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database, 2016, 2016.
  218. Maec: A multimodal aligned earnings conference call dataset for financial risk prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp.  3063–3070, 2020a.
  219. Pretrained language models for text generation: A survey. arXiv preprint arXiv:2201.05273, 2022a.
  220. Medical image classification with convolutional neural network. In 2014 13th international conference on control automation robotics & vision (ICARCV), pp.  844–848. IEEE, 2014.
  221. Event extraction for criminal legal text. In 2020 IEEE International Conference on Knowledge Graph (ICKG), pp.  573–580. IEEE, 2020b.
  222. Explanations from large language models make small reasoners better, 2022b.
  223. Are chatgpt and gpt-4 general-purpose solvers for financial text analytics? a study on several typical tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pp.  408–422, 2023d.
  224. A survey on fairness in large language models. arXiv preprint arXiv:2308.10149, 2023e.
  225. Prompt tuning pushes farther, contrastive learning pulls closer: A two-stage approach to mitigate social biases. arXiv preprint arXiv:2307.01595, 2023f.
  226. Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance, pp.  374–382, 2023g.
  227. Rain: Your language models can align themselves without finetuning. arXiv preprint arXiv:2309.07124, 2023h.
  228. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. Cureus, 15(6), 2023i.
  229. Towards understanding in-context learning with contrastive demonstrations and saliency maps. arXiv preprint arXiv:2307.05052, 2023j.
  230. Towards debiasing sentence representations. arXiv preprint arXiv:2007.08100, 2020.
  231. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.  74–81, 2004.
  232. Microsoft coco: Common objects in context. In European conference on computer vision, pp.  740–755. Springer, 2014.
  233. Claudette: an automated detector of potentially unfair clauses in online terms of service. Artificial Intelligence and Law, 27:117–139, 2019.
  234. Bootstrapping large language models for radiology report generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  18635–18643, 2024a.
  235. Tab-cqa: A tabular conversational question answering dataset on financial reports. In Sunayana Sitaram, Beata Beigman Klebanov, and Jason D. Williams (eds.), Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, ACL 2023, Toronto, Canada, July 9-14, 2023, pp.  196–207. Association for Computational Linguistics, 2023a. doi: 10.18653/V1/2023.ACL-INDUSTRY.20. URL https://doi.org/10.18653/v1/2023.acl-industry.20.
  236. Visual instruction tuning. Advances in neural information processing systems, 36, 2024b.
  237. Exploring the boundaries of gpt-4 in radiology. ArXiv, abs/2310.14573, 2023b. URL https://api.semanticscholar.org/CorpusID:264425949.
  238. Fingpt: Democratizing internet-scale data for financial large language models. CoRR, abs/2307.10485, 2023c. doi: 10.48550/ARXIV.2307.10485. URL https://doi.org/10.48550/arXiv.2307.10485.
  239. Large language models and causal inference in collaboration: A comprehensive survey. arXiv preprint arXiv:2403.09606, 2024c.
  240. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374, 2023d.
  241. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860, 2023e.
  242. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. URL http://arxiv.org/abs/1907.11692.
  243. Survey on natural language processing in medical image analysis. Zhong nan da xue xue bao. Yi xue ban= Journal of Central South University. Medical Sciences, 47(8):981–993, 2022.
  244. Finbert: A pre-trained financial language representation model for financial text mining. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp.  4513–4519, 2021.
  245. Can large language models reason about medical questions?, 2023.
  246. Zhihai LLM. Wisdom interrogatory: A toolkit for legal language understanding, 2023. URL https://github.com/zhihaiLLM/wisdomInterrogatory.
  247. A novel evaluation model for assessing chatgpt on otolaryngology–head and neck surgery certification examinations: Performance study. JMIR Medical Education, 10, 2023. URL https://api.semanticscholar.org/CorpusID:265110947.
  248. Can chatgpt forecast stock price movements? return predictability and large language models. CoRR, abs/2304.07619, 2023. doi: 10.48550/ARXIV.2304.07619. URL https://doi.org/10.48550/arXiv.2304.07619.
  249. Interpretable long-form legal question answering with retrieval-augmented large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  22266–22275, 2024.
  250. Bbt-fin: Comprehensive construction of chinese financial domain pre-trained language model, corpus and benchmark. CoRR, abs/2302.09432, 2023. doi: 10.48550/ARXIV.2302.09432. URL https://doi.org/10.48550/arXiv.2302.09432.
  251. COT: an efficient and accurate method for detecting marker genes among many subtypes. Bioinformatics Advances, 2(1):vbac037, 2022.
  252. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
  253. Lecard: a legal case retrieval dataset for chinese law system. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp.  2342–2348, 2021.
  254. Neil MacCormick. Legal reasoning and legal theory. Clarendon Press, 1994.
  255. Good debt or bad debt: Detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol., 65(4):782–796, 2014. doi: 10.1002/ASI.23062. URL https://doi.org/10.1002/asi.23062.
  256. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896, 2023.
  257. Falqu: Finding answers to legal questions. arXiv preprint arXiv:2304.05611, 2023.
  258. The ai revolution: opportunities and challenges for the finance sector. arXiv preprint arXiv:2308.16538, 2023.
  259. A holistic approach to undesired content detection in the real world. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  15009–15018, 2023.
  260. Eric Martínez. Re-evaluating gpt-4’s bar exam performance. Artificial Intelligence and Law, pp.  1–24, 2024.
  261. Machine learning for financial risk management: a survey. Ieee Access, 8:203203–203223, 2020.
  262. Multimodal conduct in the law: Language, gesture and materiality in legal interaction, volume 32. Cambridge University Press, 2018.
  263. On faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2005.00661, 2020.
  264. Robert W McGee. Is chat gpt biased against conservatives? an empirical study. An Empirical Study (February 15, 2023), 2023.
  265. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35, 2021.
  266. Medical licensing examinations in the united states. Journal of dental education, 66(5):595–599, 2002.
  267. The imperative for regulatory oversight of large language models (or generative ai) in healthcare. NPJ digital medicine, 6(1):120, 2023.
  268. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
  269. Medicalsum: A guided clinical abstractive summarization model for generating medical reports from patient-doctor conversations. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  4741–4749, 2022.
  270. A survey on the vulnerability of deep neural networks against adversarial attacks. Progress in Artificial Intelligence, 11(2):131–141, 2022.
  271. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40, 2023a.
  272. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251, 2023b.
  273. The challenges for regulating medical use of chatgpt and other large language models. Jama, 2023.
  274. Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, 19(6):1236–1246, 2018.
  275. Improving factual completeness and consistency of image-to-text radiology report generation. arXiv preprint arXiv:2010.10042, 2020.
  276. Medmentions: A large biomedical corpus annotated with umls concepts. arXiv preprint arXiv:1902.09476, 2019.
  277. Evaluating the robustness of neural language models to input perturbations. arXiv preprint arXiv:2108.12237, 2021.
  278. More human than human: Measuring chatgpt political bias. Available at SSRN 4372349, 2023.
  279. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv preprint arXiv:2308.12833, 2023.
  280. Ectsum: A new benchmark dataset for bullet point summarization of long earnings call transcripts. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp.  10893–10906. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.EMNLP-MAIN.748. URL https://doi.org/10.18653/v1/2022.emnlp-main.748.
  281. Biases in large language models: origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15(2):1–21, 2023.
  282. Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270):20230159, 2024.
  283. Classifying sentential modality in legal language: a use case in financial regulations, acts and directives. In Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, pp.  159–168, 2017.
  284. Mitigating harm in language models with conditional-likelihood filtration. arXiv preprint arXiv:2108.07790, 2021.
  285. Ha-Thanh Nguyen. A brief report on lawgpt 1.0: A virtual legal assistant based on gpt-3. arXiv preprint arXiv:2302.05729, 2023.
  286. Swiss-judgment-prediction: A multilingual legal judgment prediction benchmark. arXiv preprint arXiv:2110.00806, 2021.
  287. Capabilities of gpt-4 on medical challenge problems, 2023a.
  288. Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023b.
  289. Catherine Nunez. Artificial intelligence and legal ethics: Whether ai lawyers can make ethical decisions. Tul. J. Tech. & Intell. Prop., 20:189, 2017.
  290. A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. CoRR, abs/1806.04185, 2018. URL http://arxiv.org/abs/1806.04185.
  291. Vinayak Yogesh Ogavinee and et al. Anima: A comprehensive toolkit for medical image analysis, 2022. URL https://github.com/lyogavin/Anima.
  292. A graph-based method for unsupervised knowledge discovery from financial texts. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, and Stelios Piperidis (eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022, pp.  5412–5417. European Language Resources Association, 2022. URL https://aclanthology.org/2022.lrec-1.579.
  293. Ai-based chatbot service for financial industry. Fujitsu Scientific and Technical Journal, 54(2):4–8, 2018.
  294. Ethical and regulatory challenges of large language models in medicine. The Lancet Digital Health, 2024.
  295. OpenAI. Introducing chatgpt, 2022. URL https://openai.com/blog/chatgpt. Accessed: 2023-05-11.
  296. OpenAI. Gpt-4 technical report, 2023.
  297. Blind: Bias removal with no demographics. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8801–8821, 2023.
  298. Modal-adaptive knowledge-enhanced graph-based financial prediction from monetary policy conference calls with llm, 2024.
  299. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  300. Deep learning for financial applications: A survey. Applied soft computing, 93:106384, 2020.
  301. J P. Collins, RM Harden. Amee medical education guide no. 13: real patients, simulated patients and simulators in clinical examinations. Medical teacher, 20(6):508–521, 1998.
  302. Legalnero: A linked corpus for named entity recognition in the romanian legal domain. Semantic Web, (Preprint):1–14.
  303. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Gerardo Flores, George H. Chen, Tom J. Pollard, Joyce C. Ho, and Tristan Naumann (eds.), Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022, Virtual Event, volume 174 of Proceedings of Machine Learning Research, pp.  248–260. PMLR, 2022. URL https://proceedings.mlr.press/v174/pal22a.html.
  304. Yamini Pandey. Credit card fraud detection using deep learning. International Journal of Advanced Research in Computer Science, 8(5), 2017.
  305. Dimitris Papailiopoulos. Gpt-4 "discovered" the same sorting algorithm as alphadev by removing "mov s p"., June 2023. URL https://x.com/DimitrisPapail/status/1666843952824168465?s=20.
  306. Multi-granular legal topic classification on greek legislation. arXiv preprint arXiv:2109.15298, 2021.
  307. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318, 2002.
  308. Taejin Park. Enhancing anomaly detection in financial markets with an llm-based multi-agent framework, 2024.
  309. TweetFinSent: A dataset of stock sentiments on Twitter. In Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, and Hsin-Hsi Chen (eds.), Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP), pp.  37–47, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.finnlp-1.5. URL https://aclanthology.org/2022.finnlp-1.5.
  310. Radialog: A large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681, 2023.
  311. The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116, 2023.
  312. A survey on deep learning for financial risk prediction. Quantitative Finance and Economics, 5(4):716–737, 2021.
  313. Ai-chatgpt/gpt-4: An booster for the development of physical medicine and rehabilitation in the new era! Annals of Biomedical Engineering, 52:462 – 466, 2023. URL https://api.semanticscholar.org/CorpusID:260245954.
  314. Transfer learning in biomedical natural language processing: an evaluation of bert and elmo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474, 2019.
  315. A survey on deep learning in medicine: Why, how and when? Information Fusion, 66:111–137, 2021.
  316. Casesummarizer: a system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations, pp.  258–262, 2016.
  317. Echr: Legal corpus for argument mining. In Proceedings of the 7th Workshop on Argument Mining, pp.  67–75, 2020.
  318. Ai and ethics: A systematic review of the ethical considerations of large language model use in surgery research. In Healthcare, volume 12, pp.  825. MDPI, 2024.
  319. Perturbation augmentation for fairer nlp. arXiv preprint arXiv:2205.12586, 2022.
  320. Reasoning with language model prompting: A survey. arXiv preprint arXiv:2212.09597, 2022.
  321. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  8748–8763. PMLR, 2021a. URL http://proceedings.mlr.press/v139/radford21a.html.
  322. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021b.
  323. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  324. Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
  325. Partha Pratim Ray. Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 2023.
  326. Docfinqa: A long-context financial reasoning dataset. CoRR, abs/2401.06915, 2024. doi: 10.48550/ARXIV.2401.06915. URL https://doi.org/10.48550/arXiv.2401.06915.
  327. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv preprint arXiv:2307.11019, 2023.
  328. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1135–1144, 2016.
  329. The future of digital health with federated learning. NPJ digital medicine, 3(1):1–7, 2020.
  330. Harry V Roberts. Stock-market" patterns" and financial analysis: methodological suggestions. The Journal of Finance, 14(1):1–10, 1959.
  331. Hallucination-minimized data-to-answer framework for financial decision-makers. In 2023 IEEE International Conference on Big Data (BigData), pp.  4693–4702. IEEE, 2023.
  332. Machine learning for quantitative finance applications: A survey. Applied Sciences, 9(24):5574, 2019.
  333. Explainable ai (xai): A systematic meta-survey of current challenges and future opportunities. Knowledge-Based Systems, 263:110273, 2023.
  334. Sabiá-2: A new generation of portuguese large language models. arXiv e-prints, pp.  arXiv–2403, 2024.
  335. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022. doi: 10.48550/ARXIV.2211.05100. URL https://doi.org/10.48550/arXiv.2211.05100.
  336. Toolformer: Language models can teach themselves to use tools. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023. URL http://papers.nips.cc/paper_files/paper/2023/hash/d842425e4bf79ba039352da0f658a906-Abstract-Conference.html.
  337. Adversarial attacks and defenses in large language models: Old and new threats. arXiv preprint arXiv:2310.19737, 2023.
  338. IRLab SDU. Fuzi Mingcha: Project for [add project description here]. https://github.com/irlab-sdu/fuzi.mingcha, 2023. Accessed: April 15, 2024.
  339. Detection of fraudulent financial reports with machine learning techniques. In 2015 Systems and information engineering design symposium, pp.  358–361. IEEE, 2015.
  340. Neural machine translation of rare words with subword units. In Katrin Erk and Noah A. Smith (eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1162. URL https://aclanthology.org/P16-1162.
  341. Gender bias in legal corpora and debiasing it. Natural Language Engineering, 29(2):449–482, 2023.
  342. Trillion dollar words: A new financial dataset, task & market analysis. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp.  6664–6679. Association for Computational Linguistics, 2023. doi: 10.18653/V1/2023.ACL-LONG.368. URL https://doi.org/10.18653/v1/2023.acl-long.368.
  343. When FLUE meets FLANG: benchmarks and large pretrained language model for financial domain. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp.  2322–2335. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.EMNLP-MAIN.148. URL https://doi.org/10.18653/v1/2022.emnlp-main.148.
  344. Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844, 2023.
  345. Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145:105458, 2022.
  346. Hierarchical chinese legal event extraction via pedal attention mechanism. In Proceedings of the 28th international conference on computational linguistics, pp.  100–113, 2020.
  347. Large language model alignment: A survey. arXiv preprint arXiv:2309.15025, 2023a.
  348. "do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825, 2023b.
  349. Multi-lexsum: Real-world summaries of civil rights lawsuits at multiple granularities. Advances in Neural Information Processing Systems, 35:13158–13173, 2022.
  350. Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937, 2021.
  351. Red teaming language model detectors with language models. Transactions of the Association for Computational Linguistics, 12:174–189, 2024.
  352. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  353. Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761, 2024.
  354. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  355. Towards expert-level medical question answering with large language models, 2023.
  356. Impact of news on the commodity market: Dataset and results. CoRR, abs/2009.04202, 2020. URL https://arxiv.org/abs/2009.04202.
  357. Biosses: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics, 33(14):i49–i58, 2017.
  358. Beyond classification: Financial reasoning in state-of-the-art language models. CoRR, abs/2305.01505, 2023. doi: 10.48550/ARXIV.2305.01505. URL https://doi.org/10.48550/arXiv.2305.01505.
  359. Accurate stock movement prediction with self-supervised learning from sparse noisy tweets. In 2022 IEEE International Conference on Big Data (Big Data), pp.  1691–1700. IEEE, 2022.
  360. A dataset for evaluating legal question answering on private international law. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pp.  230–234, 2021.
  361. 2017 supreme court database, version 2017 release 01. URL: http://Supremecourtdatabase. org, 2017.
  362. On the potential and limitations of few-shot in-context learning to generate metamorphic specifications for tax preparation software. arXiv preprint arXiv:2311.11979, 2023.
  363. Evaluating llms’ mathematical reasoning in financial document question answering, 2024.
  364. To prefer or to choose? generating agency and power counterfactuals jointly for gender bias mitigation. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+ CSS), pp.  39–51, 2022.
  365. Weaving pathways for justice with gpt: Llm-driven automated drafting of interactive legal applications. arXiv preprint arXiv:2312.09198, 2023.
  366. A causal framework to quantify the robustness of mathematical reasoning with language models. arXiv preprint arXiv:2210.12023, 2022.
  367. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
  368. Trustllm: Trustworthiness in large language models. arXiv preprint arXiv:2401.05561, 2024.
  369. Zhongxiang Sun. A short survey of viewing large language models in legal aspect. arXiv preprint arXiv:2303.09136, 2023.
  370. Cass R Sunstein. Legal reasoning and political conflict. Oxford University Press, 2018.
  371. Ekaterina Svetlova. Ai ethics and systemic risks in finance. AI and Ethics, 2(4):713–725, 2022.
  372. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  373. You reap what you sow: On the challenges of bias evaluation under multilingual settings. In Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, pp.  26–41, 2022.
  374. Chatgpt as an artificial lawyer. Artificial Intelligence for Access to Justice (AI4AJ 2023), 2023.
  375. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  376. Language models get a gender makeover: Mitigating gender bias with few-shot data interventions. arXiv preprint arXiv:2306.04597, 2023.
  377. Xraygpt: Chest radiographs summarization using medical vision-language models. arXiv preprint arXiv:2306.07971, 2023.
  378. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
  379. Curriculum development for medical education: a six-step approach. JHU press, 2022.
  380. Financial fraud detection using vocal, linguistic and financial cues. Decision Support Systems, 74:78–87, 2015.
  381. Sticking to the facts: Confident decoding for faithful data-to-text generation. arXiv preprint arXiv:1910.08684, 2019.
  382. Llama: Open and efficient foundation language models, 2023a.
  383. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  384. Bioinstruct: Instruction tuning of large language models for biomedical natural language processing. ArXiv, abs/2310.19975, 2023. URL https://api.semanticscholar.org/CorpusID:264744285.
  385. Legal prompt engineering for multilingual legal judgement prediction. arXiv preprint arXiv:2212.02199, 2022.
  386. Large language models in cryptocurrency securities cases: can a gpt model meaningfully assist lawyers? Artificial Intelligence and Law, pp.  1–47, 2024.
  387. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform., 16:138:1–138:28, 2015. doi: 10.1186/S12859-015-0564-6. URL https://doi.org/10.1186/s12859-015-0564-6.
  388. Ledgar: A large-scale multi-label corpus for text classification of legal provisions in contracts. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pp.  1235–1241, 2020.
  389. Challenges and barriers of using large language models (llm) such as chatgpt for diagnostic medicine with a focus on digital pathology–a recent scoping review. Diagnostic Pathology, 19(1):1–9, 2024.
  390. United States Congress. Table of supreme court decisions overruled by subsequent decisions. https://constitution.congress.gov/resources/decisionsoverruled/, 2023. A comprehensive table listing the decisions of the U.S. Supreme Court that have been overruled by subsequent decisions.
  391. Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17(5):514–518, 2010.
  392. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation. arXiv preprint arXiv:2307.03987, 2023.
  393. Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596, 2, 2020.
  394. HEAD-QA: A healthcare dataset for complex reasoning. In Anna Korhonen, David R. Traum, and Lluís Màrquez (eds.), Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp.  960–966. Association for Computational Linguistics, 2019. doi: 10.18653/V1/P19-1092. URL https://doi.org/10.18653/v1/p19-1092.
  395. Exploring equation as a better intermediate meaning representation for numerical reasoning of large language models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17):19116–19125, Mar. 2024a. doi: 10.1609/aaai.v38i17.29879. URL https://ojs.aaai.org/index.php/AAAI/article/view/29879.
  396. Enhancing numerical reasoning with the guidance of reliable reasoning processes. arXiv preprint arXiv:2402.10654, 2024b.
  397. Docllm: A layout-aware generative language model for multimodal document understanding, 2023a.
  398. Are large language models really robust to word-level perturbations? arXiv preprint arXiv:2309.11166, 2023b.
  399. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):1–26, 2024c.
  400. Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets. CoRR, abs/2310.04793, 2023c. doi: 10.48550/ARXIV.2310.04793. URL https://doi.org/10.48550/arXiv.2310.04793.
  401. Few-shot charge prediction with data augmentation and feature augmentation. Applied Sciences, 11(22):10811, 2021.
  402. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966, 2023d.
  403. Self-guard: Empower the llm to safeguard itself. arXiv preprint arXiv:2310.15851, 2023e.
  404. Can llms like gpt-4 outperform traditional ai tools in dementia diagnosis? maybe, but not today. ArXiv, abs/2306.01499, 2023f. URL https://api.semanticscholar.org/CorpusID:259064252.
  405. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022.
  406. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 36, 2024.
  407. Chain-of-thought prompting elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022a. URL http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
  408. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022b.
  409. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  410. Intelligent financial fraud detection: a comprehensive review. Computers & security, 57:47–66, 2016.
  411. Llmediator: Gpt-4 assisted online dispute resolution. arXiv preprint arXiv:2307.16732, 2023.
  412. Automated labelling using an attention model for radiology reports of mri scans (alarm). In Medical Imaging with Deep Learning, pp.  811–826. PMLR, 2020.
  413. Steven A Wright. Ai in the law: Towards assessing ethical risks. In 2020 IEEE International Conference on Big Data (Big Data), pp.  2160–2169. IEEE, 2020.
  414. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. arXiv preprint arXiv:2310.09909, 2023a.
  415. Pmc-llama: Towards building open-source language models for medicine, 2023b.
  416. Hybrid deep sequential modeling for social text-driven stock prediction. In Alfredo Cuzzocrea, James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, Mohammed J. Zaki, K. Selçuk Candan, Alexandros Labrinidis, Assaf Schuster, and Haixun Wang (eds.), Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018, pp.  1627–1630. ACM, 2018. doi: 10.1145/3269206.3269290. URL https://doi.org/10.1145/3269206.3269290.
  417. A survey on large language models for recommendation. arXiv preprint arXiv:2305.19860, 2023c.
  418. Bloomberggpt: A large language model for finance. CoRR, abs/2303.17564, 2023d. doi: 10.48550/ARXIV.2303.17564. URL https://doi.org/10.48550/arXiv.2303.17564.
  419. Analyzing chain-of-thought prompting in large language models via gradient-based feature attributions. arXiv preprint arXiv:2307.13339, 2023e.
  420. Deep learning in clinical natural language processing: a methodical review. Journal of the American Medical Informatics Association, 27(3):457–470, 2020.
  421. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016. URL http://arxiv.org/abs/1609.08144.
  422. Text-based chatbot in financial sector: A systematic literature review. Data Sci. Financ. Econ, 2(3):232–259, 2022.
  423. Passing a usa national bar exam: a first corpus for experimentation. In LREC 2016, Tenth International Conference on Language Resources and Evaluation. LREC, 2016.
  424. Cail2018: A large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478, 2018.
  425. Cail2019-scm: A dataset of similar case matching in legal domain. arXiv preprint arXiv:1911.08962, 2019.
  426. Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2:79–84, 2021.
  427. Peng Xiao-Song. LaWGPT: A Legal Writing GPT Model. https://github.com/pengxiao-song/LaWGPT, 2024. Accessed: 2024-04-29.
  428. PIXIU: A large language model, instruction data and evaluation benchmark for finance. CoRR, abs/2306.05443, 2023. doi: 10.48550/ARXIV.2306.05443. URL https://doi.org/10.48550/arXiv.2306.05443.
  429. Me llama: Foundation large language models for medical applications. arXiv preprint arXiv:2402.12749, 2024a.
  430. The finben: An holistic financial benchmark for large language models. arXiv preprint arXiv:2402.12659, 2024b.
  431. An empirical analysis of parameter-efficient methods for debiasing pre-trained language models. arXiv preprint arXiv:2306.04067, 2023.
  432. Frank Xing. Designing heterogeneous llm agents for financial sentiment analysis. arXiv preprint arXiv:2401.05799, 2024.
  433. Natural language based financial forecasting: a survey. Artificial Intelligence Review, 50(1):49–73, 2018.
  434. Baize: An open-source chat model with parameter-efficient tuning on self-chat data, 2023.
  435. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pp.  2048–2057, 2015.
  436. Stock movement prediction from tweets and historical prices. In Iryna Gurevych and Yusuke Miyao (eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pp.  1970–1979. Association for Computational Linguistics, 2018. doi: 10.18653/V1/P18-1183. URL https://aclanthology.org/P18-1183/.
  437. Nicole Yamane. Artificial intelligence in the legal field and the indispensable human element legal ethics demands. Geo. J. Legal Ethics, 33:877, 2020.
  438. Weakly supervised contrastive learning for chest x-ray report generation. arXiv preprint arXiv:2109.12242, 2021.
  439. Radbert: Adapting transformer-based language models to radiology. Radiology: Artificial Intelligence, 4(4):e210258, 2022.
  440. Personalized showcases: Generating multi-modal explanations for recommendations. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  2251–2255, 2023a.
  441. Learning concise and descriptive attributes for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3090–3100, 2023b.
  442. Robust and interpretable medical image classifiers via concept bottleneck models. arXiv preprint arXiv:2310.03182, 2023c.
  443. List items one by one: A new data source and learning paradigm for multimodal llms. arXiv preprint arXiv:2404.16375, 2024a.
  444. Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(1):90–112, 2024b.
  445. Finbert: A pretrained language model for financial communications. CoRR, abs/2006.08097, 2020. URL https://arxiv.org/abs/2006.08097.
  446. Investlm: A large language model for investment using financial domain instruction tuning. CoRR, abs/2309.13064, 2023a. doi: 10.48550/ARXIV.2309.13064. URL https://doi.org/10.48550/arXiv.2309.13064.
  447. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1):1, 2023b.
  448. Leven: A large-scale chinese legal event detection dataset. arXiv preprint arXiv:2203.08556, 2022a.
  449. Improving out-of-distribution robustness via selective augmentation. In ICML, volume 162 of Proceedings of Machine Learning Research, pp.  25407–25437. PMLR, 2022b.
  450. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, pp.  100211, 2024.
  451. UReader: Universal OCR-free visually-situated language understanding with multimodal large language model. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  2841–2858, Singapore, December 2023a. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.187. URL https://aclanthology.org/2023.findings-emnlp.187.
  452. Assessing hidden risks of llms: an empirical study on robustness, consistency, and credibility. arXiv preprint arXiv:2305.10235, 2023b.
  453. Legal prompting: Teaching a language model to think like a lawyer. arXiv preprint arXiv:2212.01326, 2022.
  454. Exploring the effectiveness of prompt engineering for legal reasoning tasks. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  13582–13596, 2023a.
  455. Kola: Carefully benchmarking world knowledge of large language models. arXiv preprint arXiv:2306.09296, 2023b.
  456. Leveraging generative ai and large language models: a comprehensive roadmap for healthcare integration. In Healthcare, volume 11, pp.  2776. MDPI, 2023c.
  457. Generate rather than retrieve: Large language models are strong context generators, 2023d.
  458. Harnessing LLMs for temporal data - a study on explainable financial time series forecasting. In Mingxuan Wang and Imed Zitouni (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pp.  739–753, Singapore, December 2023e. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-industry.69. URL https://aclanthology.org/2023.emnlp-industry.69.
  459. Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and llms evaluations. Advances in Neural Information Processing Systems, 36, 2024.
  460. Enabling and analyzing how to efficiently extract information from hybrid long documents with llms, 2024.
  461. Large language models meet nl2code: A survey. arXiv preprint arXiv:2212.09420, 2022.
  462. Large language models for robotics: A survey. arXiv preprint arXiv:2311.07226, 2023.
  463. Instruct-fingpt: Financial sentiment analysis by instruction tuning of general-purpose large language models. CoRR, abs/2306.12659, 2023a. doi: 10.48550/ARXIV.2306.12659. URL https://doi.org/10.48550/arXiv.2306.12659.
  464. Enhancing financial sentiment analysis via retrieval augmented large language models. In Proceedings of the Fourth ACM International Conference on AI in Finance, ICAIF ’23, pp.  349–356, New York, NY, USA, 2023b. Association for Computing Machinery. ISBN 9798400702402. doi: 10.1145/3604237.3626866. URL https://doi.org/10.1145/3604237.3626866.
  465. A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys, 56(3):1–37, 2023c.
  466. Evaluation ethics of llms in legal domain. arXiv preprint arXiv:2403.11152, 2024a.
  467. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792, 2023d.
  468. Mitigating language model hallucination with interactive question-knowledge alignment. arXiv preprint arXiv:2305.13669, 2023e.
  469. Dólares or dollars? unraveling the bilingual prowess of financial llms between spanish and english. CoRR, abs/2402.07405, 2024b. doi: 10.48550/ARXIV.2402.07405. URL https://doi.org/10.48550/arXiv.2402.07405.
  470. Enhancing small medical learners with privacy-preserving contextual prompting. CoRR, abs/2305.12723, 2023f. doi: 10.48550/ARXIV.2305.12723. URL https://doi.org/10.48550/arXiv.2305.12723.
  471. Enhancing small medical learners with privacy-preserving contextual prompting, 2023g.
  472. Gpt-4v(ision) as a generalist evaluator for vision-language tasks, 2023h.
  473. Alpacare:instruction-tuned large language models for medical application, 2023i.
  474. Intelligent analysis and application of judicial big data sharing based on blockchain. In 2023 6th International Conference on Artificial Intelligence and Big Data (ICAIBD), pp.  592–596. IEEE, 2023j.
  475. Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023k.
  476. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pp.  2–25. PMLR, 2022.
  477. Mengzi: Towards lightweight yet ingenious pre-trained models for chinese. CoRR, abs/2110.06696, 2021. URL https://arxiv.org/abs/2110.06696.
  478. Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2):1–38, 2024.
  479. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023a.
  480. MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  6588–6600, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.454. URL https://aclanthology.org/2022.acl-long.454.
  481. Knowledgemath: Knowledge-intensive math word problem solving in finance domains, 2023b.
  482. Andrew Zhe. Lawyer-llama: A legal-specific language model, 2023. URL https://github.com/AndrewZhe/lawyer-llama.
  483. Andrew Zhe. lawyer-llama: A Machine Learning Toolkit for Legal Analysis. https://github.com/AndrewZhe/lawyer-llama, 2024. Accessed: 2024-04-29.
  484. Coreference resolution: A review of general methodologies and applications in the clinical domain. Journal of biomedical informatics, 44(6):1113–1122, 2011.
  485. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  486. When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In Proceedings of the eighteenth international conference on artificial intelligence and law, pp.  159–168, 2021.
  487. Jec-qa: a legal-domain question answering dataset. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  9701–9708, 2020.
  488. LIMA: less is more for alignment. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023. URL http://papers.nips.cc/paper_files/paper/2023/hash/ac662d74829e4407ce1d126477f4a03a-Abstract-Conference.html.
  489. A survey of large language models in medicine: Progress, application, and challenge, 2024a.
  490. Are large language models rational investors? arXiv preprint arXiv:2402.12713, 2024b.
  491. Trade the event: Corporate events detection for news-based event-driven trading. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL/IJCNLP 2021 of Findings of ACL, pp.  2114–2124. Association for Computational Linguistics, 2021. doi: 10.18653/V1/2021.FINDINGS-ACL.186. URL https://doi.org/10.18653/v1/2021.findings-acl.186.
  492. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pp.  3277–3287. Association for Computational Linguistics, 2021. doi: 10.18653/V1/2021.ACL-LONG.254. URL https://doi.org/10.18653/v1/2021.acl-long.254.
  493. Tat-llm: A specialized language model for discrete reasoning over tabular and textual data, 2024.
  494. Visualize before you write: Imagination-guided open-ended text generation. arXiv preprint arXiv:2210.03765, 2022.
  495. A survey on model compression for large language models. arXiv preprint arXiv:2308.07633, 2023a.
  496. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023b.
  497. Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867, 2023.
  498. Leec: A legal element extraction dataset with an extensive domain-specific label system. arXiv preprint arXiv:2310.01271, 2023.
  499. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
  500. Astock: A new dataset and automated stock trading based on stock-specific news analyzing model. CoRR, abs/2206.06606, 2022. doi: 10.48550/ARXIV.2206.06606. URL https://doi.org/10.48550/arXiv.2206.06606.
  501. Retrieving similar cases for construction project risk management using natural language processing techniques. Automation in construction, 80:66–76, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Zhiyu Zoey Chen (9 papers)
  2. Jing Ma (136 papers)
  3. Xinlu Zhang (15 papers)
  4. Nan Hao (3 papers)
  5. An Yan (31 papers)
  6. Armineh Nourbakhsh (18 papers)
  7. Xianjun Yang (37 papers)
  8. Julian McAuley (238 papers)
  9. Linda Petzold (45 papers)
  10. William Yang Wang (254 papers)
Citations (17)