Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey (2402.14558v2)

Published 22 Feb 2024 in cs.CL

Abstract: LLMs have become the secret ingredient driving numerous industrial applications, showcasing their remarkable versatility across a diverse spectrum of tasks. From natural language processing and sentiment analysis to content generation and personalized recommendations, their unparalleled adaptability has facilitated widespread adoption across industries. This transformative shift driven by LLMs underscores the need to explore the underlying associated challenges and avenues for enhancement in their utilization. In this paper, our objective is to unravel and evaluate the obstacles and opportunities inherent in leveraging LLMs within an industrial context. To this end, we conduct a survey involving a group of industry practitioners, develop four research questions derived from the insights gathered, and examine 68 industry papers to address these questions and derive meaningful conclusions. We maintain the Github repository with the most papers in the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (119)
  1. Llm based generation of item-description for recommendation system. In Proceedings of the 17th ACM Conference on Recommender Systems, pages 1204–1207.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  3. Can generative llms create query variants for test collections? an exploratory study. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1869–1873.
  4. Feta: Towards specializing foundational models for expert task applications. Advances in Neural Information Processing Systems, 35:29873–29888.
  5. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  6. Multi-lingual evaluation of code generation models. In ICLR 2023.
  7. Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
  8. Bootstrapping multilingual semantic parsers using large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2455–2467, Dubrovnik, Croatia. Association for Computational Linguistics.
  9. Codeplan: Repository-level coding using llms and planning. arXiv preprint arXiv:2309.12499.
  10. New gpt-3 capabilities: Edit & insert. OpenAI Blog.
  11. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  12. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  13. H2O open ecosystem for state-of-the-art large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 82–89, Singapore. Association for Computational Linguistics.
  14. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  15. Logical natural language generation from open-domain tables. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7929–7942, Online. Association for Computational Linguistics.
  16. Tabfact: A large-scale dataset for table-based fact verification. International Conference on Learning Representations.
  17. Empowering practical root cause analysis by large language models for cloud incidents. arXiv preprint arXiv:2305.15778.
  18. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  19. Llmr: Real-time prompting of interactive worlds using large language models. arXiv preprint arXiv:2309.12276.
  20. What do llms know about financial markets? a case study on reddit market sentiment analysis. In Companion Proceedings of the ACM Web Conference 2023, pages 107–110.
  21. Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 376–380, Baltimore, Maryland, USA. Association for Computational Linguistics.
  22. Victor Dibia. 2023. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 113–126, Toronto, Canada. Association for Computational Linguistics.
  23. A static evaluation of code completion by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 347–360, Toronto, Canada. Association for Computational Linguistics.
  24. Large language models of code fail at completing code with potential bugs. arXiv preprint arXiv:2306.03438.
  25. Layoutgpt: Compositional visual planning and generation with large language models. arXiv preprint arXiv:2305.15393.
  26. InstructPTS: Instruction-tuning LLMs for product title summarization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 663–674, Singapore. Association for Computational Linguistics.
  27. Matching pairs: Attributing fine-tuned models to their pre-trained large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7423–7442, Toronto, Canada. Association for Computational Linguistics.
  28. " i wouldn’t say offensive but…": Disability-centered perspectives on large language models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 205–216.
  29. Xinyang Geng and Hao Liu. 2023. Openllama: An open reproduction of llama. URL: https://github. com/openlm-research/open_llama.
  30. Vinicius G Goecks and Nicholas R Waytowich. 2023. Disasterresponsegpt: Large language models for accelerated plan of action development in disaster response scenarios. arXiv preprint arXiv:2306.17271.
  31. Fabricator: An open source toolkit for generating labeled training data with teacher LLMs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 1–11, Singapore. Association for Computational Linguistics.
  32. Grace: Language models meet code edits. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1483–1495.
  33. A survey on large language models: Applications, challenges, limitations, and practical usage. TechRxiv.
  34. Md Mahim Anjum Haque. 2023. Fixeval: Execution-based evaluation of program fixes for competitive programming problems. Ph.D. thesis, Virginia Tech.
  35. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  36. TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4320–4333, Online. Association for Computational Linguistics.
  37. Ralle: A framework for developing and evaluating retrieval-augmented large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 52–69.
  38. Promptcap: Prompt-guided image captioning for vqa with gpt-3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2963–2975.
  39. Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards reasoning in large language models: A survey. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1049–1065, Toronto, Canada. Association for Computational Linguistics.
  40. MathPrompter: Mathematical reasoning using large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 37–42, Toronto, Canada. Association for Computational Linguistics.
  41. Shotaro Ishihara. 2023. Training data extraction from pre-trained language models: A survey. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 260–275, Toronto, Canada. Association for Computational Linguistics.
  42. Mistral 7b. arXiv preprint arXiv:2310.06825.
  43. LLMLingua: Compressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13358–13376, Singapore. Association for Computational Linguistics.
  44. Assess and summarize: Improve outage understanding with large language models. arXiv preprint arXiv:2305.18084.
  45. Understanding the benefits and challenges of deploying conversational ai leveraging large language models for public health intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–16.
  46. Unravelling the impact of generative artificial intelligence (gai) in industrial applications: A review of scientific and grey literature. Global Journal of Flexible Systems Management, 24(4):659–689.
  47. What changes can large-scale language models bring? intensive study on HyperCLOVA: Billions-scale Korean generative pretrained transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3405–3424, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  48. Propile: Probing privacy leakage in large language models. arXiv preprint arXiv:2307.01881.
  49. The past, present and better future of feedback learning in large language models for subjective human preferences and values. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2409–2430, Singapore. Association for Computational Linguistics.
  50. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pages 12–24.
  51. Bum Chul Kwon and Nandana Mihindukulasooriya. 2023. Finspector: A human-centered visual inspection tool for exploring and comparing biases among foundation models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 42–50, Toronto, Canada. Association for Computational Linguistics.
  52. Building real-world meeting summarization systems using large language models: A practical perspective. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 343–352, Singapore. Association for Computational Linguistics.
  53. KoSBI: A dataset for mitigating social bias risks towards safer large language model applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 208–224, Toronto, Canada. Association for Computational Linguistics.
  54. Using llms to customize the ui of webpages. In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–3.
  55. On the steerability of large language models toward data-driven personas. In CIKM 2023.
  56. Are ChatGPT and GPT-4 general-purpose solvers for financial text analytics? a study on several typical tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 408–422, Singapore. Association for Computational Linguistics.
  57. Yinheng Li. 2023. A practical survey on zero-shot prompt design for in-context learning. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 641–647, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
  58. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  59. Benchmarking large language models on cmexam–a comprehensive chinese medical exam dataset. arXiv preprint arXiv:2306.03030.
  60. Tapex: Table pre-training via learning a neural sql executor. International Conference on Learning Representations.
  61. Exploring the boundaries of GPT-4 in radiology. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14414–14445, Singapore. Association for Computational Linguistics.
  62. Visual captions: Augmenting verbal communication with on-the-fly visuals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–20.
  63. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374.
  64. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
  65. Effectively fine-tune to improve large multimodal models for radiology report generation. arXiv preprint arXiv:2312.01504.
  66. Enhancing network management using code generated by large language models. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, pages 196–204.
  67. Llm-based aspect augmentations for recommendation systems. Openreview.
  68. Bertalan Mesko and Eric J. Topol. 2023. The imperative for regulatory oversight of large language models (or generative ai) in healthcare. npj Digit. Med.
  69. Unleashing the potential of data lakes with semantic enrichment using foundation models. In ISWC 2023.
  70. Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS), 27(1):1–27.
  71. Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12284–12314, Toronto, Canada. Association for Computational Linguistics.
  72. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  73. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  74. Youngja Park and Weiqiu You. 2023. A pretrained language model for cyber threat intelligence. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 113–122, Singapore. Association for Computational Linguistics.
  75. Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China. Association for Computational Linguistics.
  76. Answering causal questions with augmented llms. In ICML 2023 DeployableGenerative AI Workshop.
  77. The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  78. Are you copying my model? protecting the copyright of large language models for EaaS via backdoor watermark. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7653–7668, Toronto, Canada. Association for Computational Linguistics.
  79. Promptinfuser: Bringing user interface mock-ups to life with large language models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–6.
  80. Generative ai for programming education: Benchmarking chatgpt, gpt-4, and human tutors. International Journal of Management, 21(2):100790.
  81. Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks. In Annual Conference on Neural Information Processing Systems.
  82. Reasoning with language model prompting: A survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5368–5393, Toronto, Canada. Association for Computational Linguistics.
  83. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  84. Language models are unsupervised multitask learners. OpenAI blog.
  85. INVITE: a testbed of automatically generated invalid questions to evaluate large language models for hallucinations. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5422–5429, Singapore. Association for Computational Linguistics.
  86. NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 431–445, Singapore. Association for Computational Linguistics.
  87. Hadeel Saadany and Constantin Orasan. 2023. Automatic linking of judgements to UK Supreme Court hearings. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 492–500, Singapore. Association for Computational Linguistics.
  88. Large language models are competitive near cold-start recommenders for language-and item-based preferences. In Proceedings of the 17th ACM conference on recommender systems, pages 890–896.
  89. Introducing mathqa: a math-aware question answering system. Information Discovery and Delivery, 46(4):214–224.
  90. Beyond summarization: Designing ai support for real-world expository writing tasks. arXiv preprint arXiv:2304.02623.
  91. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE.
  92. Tabular representation, noisy operators, and impacts on table structure understanding tasks in llms. arXiv preprint arXiv:2310.10358.
  93. DELPHI: Data for evaluating LLMs’ performance in handling controversial issues. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 820–827, Singapore. Association for Computational Linguistics.
  94. Jiao Sun and Nanyun Peng. 2021. Men are elected, women are married: Events gender bias on Wikipedia. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 350–360, Online. Association for Computational Linguistics.
  95. Large language models in medicine. Nature medicine, 29(8):1930–1940.
  96. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  97. “the less i type, the better”: How ai language models can enhance or impede communication for aac users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–14.
  98. Chatgpt empowered long-step robot control in various environments: A case application. arXiv preprint arXiv:2304.03893.
  99. “kelly is a warm person, joseph is a role model”: Gender biases in LLM-generated reference letters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3730–3748, Singapore. Association for Computational Linguistics.
  100. Towards understanding chain-of-thought prompting: An empirical study of what matters. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2717–2739, Toronto, Canada. Association for Computational Linguistics.
  101. Enabling conversational interaction with mobile ui using large language models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–17.
  102. Query2doc: Query expansion with large language models. arXiv preprint arXiv:2303.07678.
  103. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS), 28(4):1–38.
  104. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  105. Building a hospitable and reliable dialogue system for android robots: a scenario-based approach with large language models. Advanced Robotics, 37(21):1364–1381.
  106. Empower large language model to perform better on industrial domain-specific question answering. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 294–312, Singapore. Association for Computational Linguistics.
  107. Large language models are versatile decomposers: Decomposing evidence and questions for table-based reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 174–184.
  108. Dynosaur: A dynamic growth paradigm for instruction-tuning data curation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4031–4047, Singapore. Association for Computational Linguistics.
  109. Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations.
  110. Harnessing LLMs for temporal data - a study on explainable financial time series forecasting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 739–753, Singapore. Association for Computational Linguistics.
  111. Large language models meet NL2Code: A survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7443–7464, Toronto, Canada. Association for Computational Linguistics.
  112. Flowmind: Automatic workflow generation with llms. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 73–81.
  113. AlignScore: Evaluating factual consistency with a unified alignment function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11328–11348, Toronto, Canada. Association for Computational Linguistics.
  114. Bootstrap your own skills: Learning to solve new tasks with large language model guidance. In Conference on Robot Learning, pages 302–325. PMLR.
  115. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  116. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
  117. How do large language models capture the ever-changing world knowledge? a review of recent advances. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8289–8311, Singapore. Association for Computational Linguistics.
  118. A survey of large language models. arXiv preprint arXiv:2303.18223.
  119. Investigating table-to-text generation capabilities of large language models in real-world information seeking scenarios. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 160–175, Singapore. Association for Computational Linguistics.
Citations (3)

Summary

We haven't generated a summary for this paper yet.