Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Large Language Models from Concept to Implementation (2403.18969v2)

Published 27 Mar 2024 in cs.CL, cs.AI, cs.IT, cs.LG, and math.IT

Abstract: Recent advancements in LLMs, particularly those built on Transformer architectures, have significantly broadened the scope of NLP applications, transcending their initial use in chatbot technology. This paper investigates the multifaceted applications of these models, with an emphasis on the GPT series. This exploration focuses on the transformative impact of AI driven tools in revolutionizing traditional tasks like coding and problem-solving, while also paving new paths in research and development across diverse industries. From code interpretation and image captioning to facilitating the construction of interactive systems and advancing computational domains, Transformer models exemplify a synergy of deep learning, data analysis, and neural network design. This survey provides an in-depth look at the latest research in Transformer models, highlighting their versatility and the potential they hold for transforming diverse application sectors, thereby offering readers a comprehensive understanding of the current and future landscape of Transformer-based LLMs in practical applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (127)
  1. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  2. Codefusion: A pre-trained diffusion model for code generation. arXiv preprint arXiv:2310.17680, 2023.
  3. Anis Koubaa. Gpt-4 vs. gpt-3.5: A concise showdown. TechRxiv preprint:22312330, 2023.
  4. Assessing the usefulness of a large language model to query and summarize unstructured medical notes in intensive care. Intensive Care Medicine, pages 1–3, 2023.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  7. Large language model is not a good few-shot information extractor, but a good reranker for hard samples! arXiv preprint arXiv:2303.08559, 2023.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22522–22531, 2023.
  10. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  11. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 34:19822–19835, 2021.
  12. Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1–41, 2022.
  13. Improving text-to-code generation with features of code graph on gpt-2. Electronics, 10(21):2706, 2021.
  14. Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8:264–280, 2020.
  15. Denoising based sequence-to-sequence pre-training for text generation. arXiv preprint arXiv:1908.08206, 2019.
  16. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  17. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
  18. Decap: Decoding clip latents for zero-shot captioning via text-only training. arXiv preprint arXiv:2303.03032, 2023a.
  19. A frustratingly simple approach for end-to-end image captioning. arXiv preprint arXiv:2201.12723, 2022.
  20. Unlocking the power of generative ai models and systems such as gpt-4 and chatgpt for higher education: A guide for students and lecturers. Technical report, Hohenheim Discussion Papers in Business, Economics and Social Sciences, 2023.
  21. Groupvit: Semantic segmentation emerges from text supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18134–18144, 2022.
  22. Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models. PLoS digital health, 2(2):e0000198, 2023.
  23. Generating audio-visual slideshows from text articles using word concreteness. In CHI, volume 20, pages 25–30, 2020.
  24. An integral projection-based semantic autoencoder for zero-shot learning. IEEE Access, 2023.
  25. Zest: Zero-shot learning from text descriptions using textual similarity and visual summarization. arXiv preprint arXiv:2010.03276, 2020.
  26. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE international conference on computer vision, pages 4166–4174, 2015.
  27. When low resource nlp meets unsupervised language model: Meta-pretraining then meta-learning for few-shot text classification (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34-10, pages 13773–13774, 2020.
  28. A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need? arXiv preprint arXiv:2303.11717, 2023.
  29. Sets2sets: Learning from sequential sets with neural networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1491–1499, 2019.
  30. Nrtr: A no-recurrence sequence-to-sequence model for scene text recognition. In 2019 International conference on document analysis and recognition (ICDAR), pages 781–786. IEEE, 2019.
  31. Self-attention based text knowledge mining for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5983–5992, 2021.
  32. Text-to-image generation via implicit visual guidance and hypernetwork. arXiv preprint arXiv:2208.08493, 2022.
  33. Few-shot image generation with elastic weight consolidation. arXiv preprint arXiv:2012.02780, 2020.
  34. A robust text image recognition model with domain adaptation and attention mechanisms. In 2022 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), pages 7–12. IEEE, 2022.
  35. Adavae: Exploring adaptive gpt-2s in variational auto-encoders for language modeling. arXiv preprint arXiv:2205.05862, 2022.
  36. Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021.
  37. Text-only training for image captioning using noise-injected clip. arXiv preprint arXiv:2211.00575, 2022.
  38. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  39. Distinctive image captioning via clip guided group optimization. In European Conference on Computer Vision, pages 223–238. Springer, 2022a.
  40. Transductive clip with class-conditional contrastive learning. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3858–3862. IEEE, 2022.
  41. Chinese clip: Contrastive vision-language pretraining in chinese. arXiv preprint arXiv:2211.01335, 2022.
  42. X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. In Proceedings of the 30th ACM International Conference on Multimedia, pages 638–647, 2022.
  43. Proposalclip: Unsupervised open-category object proposal generation via exploiting clip cues. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9611–9620, 2022.
  44. Adopting abstract images for semantic scene understanding. IEEE transactions on pattern analysis and machine intelligence, 38(4):627–638, 2014.
  45. Comprehensive review of various optimization algorithms for image captioning. In 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), pages 1703–1708. IEEE, 2021.
  46. Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17969–17979, 2022.
  47. Scaling up vision-language pre-training for image captioning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17980–17989, 2022.
  48. Attention on attention for image captioning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4634–4643, 2019.
  49. Image captioning model using attention and object features to mimic human image understanding. Journal of Big Data, 9(1):1–16, 2022.
  50. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  51. Zhengcong Fei. Attention-aligned transformer for image captioning. In proceedings of the AAAI Conference on Artificial Intelligence, volume 36-1, pages 607–615, 2022.
  52. Is model attention aligned with human attention? an empirical study on large language models for code generation. arXiv preprint arXiv:2306.01220, 2023.
  53. Pay attention to mlps. Advances in Neural Information Processing Systems, 34:9204–9215, 2021.
  54. gmlp guided deep networks model for character-based handwritten text transcription. Multimedia Tools and Applications, pages 1–19, 2023.
  55. Prophnet: Efficient agent-centric motion forecasting with anchor-informed proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21995–22003, 2023a.
  56. Smallcap: lightweight image captioning prompted with retrieval augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2840–2849, 2023.
  57. A multimodal fusion approach for image captioning. Neurocomputing, 329:476–485, 2019.
  58. Deep reinforcement learning-based image captioning with embedding reward. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 290–298, 2017.
  59. Zhaojie Yan. Reinforcement learning transformer for image captioning generation model. In Fifteenth International Conference on Machine Vision (ICMV 2022), volume 12701, pages 166–172. SPIE, 2023.
  60. Improve image captioning via relation modeling. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1945–1949. IEEE, 2022.
  61. Towards explainable formal methods: From ltl to natural language with neural machine translation. In International Working Conference on Requirements Engineering: Foundation for Software Quality, pages 79–86. Springer, 2022.
  62. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015.
  63. Towards language-guided interactive 3d generation: Llms as layout interpreter with generative feedback. arXiv preprint arXiv:2305.15808, 2023.
  64. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. arXiv preprint arXiv:2305.11175, 2023b.
  65. Maryna Bilan. Statistics of chatgpt & generative ai in business: 2023 report, 2023. URL https://masterofcode.com/blog/statistics-of-chatgpt-generative-ai-in-business-2023.
  66. Fortune Business Insights/Technology. Natural language processing (nlp) market, 2023. URL https://www.fortunebusinessinsights.com/industry-reports/natural-language-processing-nlp-market-101933.
  67. The Market Publicist. Ai large language model market precise outlook 2023 openai, microsoft, google, nvidia, alibaba, baidu, 2023. URL https://www.benzinga.com/pressreleases/23/09/34645297/ai-large-language-model-market-precise-outlook-2023-openai-microsoft-google-nvidia-alibaba-baidu.
  68. Global Info Research. Global large language model (llm) supply, demand, and key producers, 2023–2029, 2023. URL https://www.orbisresearch.com/reports/index/global-large-language-model-llm-supply-demand-and-key-producers-2023-2029.
  69. Digital Journal / Newsmantraa. Large language model(llm) market 2023 growth drivers and future outlook 2030, 2023. URL https://www.digitaljournal.com/pr/news/newsmantraa/large-language-model-llm-market-2023-growth-drivers-and-future-outlook-2030-meta-ai21-labs-tencent.
  70. Infinity Business Insights. Global large language model (llm) market witnesses rapid growth as ai language processing takes center stage 2023 to 2030, 2023. URL https://www.openpr.com/news/3121457/global-large-language-model-llm-market-witnesses-rapid-growth.
  71. The nlp cookbook: modern recipes for transformer based deep learning architectures. IEEE Access, 9:68675–68702, 2021.
  72. A framework for accelerating transformer-based language model on reram-based architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(9):3026–3039, 2021.
  73. Jia Guo. Deep learning approach to text analysis for human emotion detection from big data. Journal of Intelligent Systems, 31(1):113–126, 2022.
  74. Multi-view document representation learning for open-domain dense retrieval. arXiv preprint arXiv:2203.08372, 2022b.
  75. Distill-vq: Learning retrieval oriented vector quantization by distilling knowledge from dense embeddings. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1513–1523, 2022.
  76. Parm: A paragraph aggregation retrieval model for dense document-to-document retrieval. In European Conference on Information Retrieval, pages 19–34. Springer, 2022.
  77. Susan Hazan. The dance of the doppelgängers: Ai and the cultural heritage community. In Proceedings of EVA London 2023, pages 77–84. BCS Learning & Development, 2023.
  78. Designing a realistic peer-like embodied conversational agent for supporting children’s storytelling. arXiv preprint arXiv:2304.09399, 2023.
  79. Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704, 2023.
  80. Aiwriting: Relations between image generation and digital writing. arXiv preprint arXiv:2305.10834, 2023.
  81. Cognitive techniques in visual data interpretation, volume 228. Springer, 2009.
  82. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  83. Educational decision making with visual data and graphical interpretation: assessing the effects of user preference and accuracy. Sage Open, 6(4):2158244016678290, 2016.
  84. Focus: Big data: Artificial intelligence to improve patient understanding of radiology reports. The Yale Journal of Biology and Medicine, 96(3):407, 2023.
  85. An opinion on chatgpt in health care—written by humans only, 2023.
  86. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11315–11325, 2022.
  87. Medical image analysis based on transformer: A review. arXiv preprint arXiv:2208.06643, 2022.
  88. Artificial intelligence (ai) in radiology: A deep dive into chatgpt 4.0’s accuracy with the american journal of neuroradiology’s (ajnr)" case of the month". Cureus, 15(8), 2023.
  89. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922, 2023c.
  90. Think outside the code: Brainstorming boosts large language models in code generation. arXiv preprint arXiv:2305.10679, 2023b.
  91. Repair is nearly generation: Multilingual program repair with llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37-4, pages 5131–5140, 2023.
  92. Vit-dae: Transformer-driven diffusion autoencoder for histopathology image analysis. arXiv preprint arXiv:2304.01053, 2023.
  93. Leveraging a medical knowledge graph into large language models for diagnosis prediction. arXiv preprint arXiv:2308.14321, 2023.
  94. A knowledge-graph-based intrinsic test for benchmarking medical concept embeddings and pretrained language models. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 197–206, 2022.
  95. Demystifying prompts in language models via perplexity estimation. arXiv preprint arXiv:2212.04037, 2022.
  96. Knowledge-enhanced pre-training large language model for depression diagnosis and treatment. In 2023 IEEE 9th International Conference on Cloud Computing and Intelligent Systems (CCIS), pages 532–536. IEEE, 2023d.
  97. Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007, 2023.
  98. OpenAI. Gpt-4v(ision) system card, 2023a. URL https://openai.com/research/gpt-4v-system-card.
  99. OpenAI. Gpt-4v(ision) system card, 2023b. URL https://cdn.openai.com/papers/GPTV_System_Card.pdf.
  100. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  101. Visual adversarial examples jailbreak aligned large language models. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023.
  102. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019.
  103. Model-based characterization of ventilatory stability using spontaneous breathing. Journal of Applied Physiology, 111(1):55–67, 2011.
  104. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems, 135:364–381, 2022.
  105. Rational design of the lotus-like n-co2vo4-co heterostructures with well-defined interfaces in suppressing the shuttle effect and dendrite growth in lithium–sulfur batteries. Small, 17(50):2104109, 2021.
  106. A sustainable multipurpose separator directed against the shuttle effect of polysulfides for high-performance lithium–sulfur batteries. Advanced Energy Materials, 12(19):2200160, 2022.
  107. Ion-selective covalent organic framework membranes as a catalytic polysulfide trap to arrest the redox shuttle effect in lithium–sulfur batteries. ACS Applied Materials & Interfaces, 14(3):4079–4090, 2022.
  108. Application of electrochemical impedance spectroscopy to degradation and aging research of lithium-ion batteries. The Journal of Physical Chemistry C, 127(9):4465–4495, 2023.
  109. Elucidating degradation mechanisms of silicon-graphite electrodes in lithium-ion batteries by local electrochemistry. Batteries & Supercaps, 6(8):e202300126, 2023.
  110. Degradation mechanism of all-solid-state li-metal batteries studied by electrochemical impedance spectroscopy. ACS Applied Materials & Interfaces, 14(36):40881–40889, 2022.
  111. Chemistry of sputter-deposited lithium sulfide films. Journal of the American Chemical Society, 139(31):10669–10676, 2017.
  112. Applying genetic programming to psb2: the next generation program synthesis benchmark suite. Genetic Programming and Evolvable Machines, 23(3):375–404, 2022.
  113. Benchmarking parent selection for program synthesis by genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pages 237–238, 2020.
  114. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  115. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  116. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155, 2020.
  117. Jack Roper. Transformer-based program synthesis for low-data environments. arXiv preprint arXiv:2205.09246, 2022.
  118. Gpt-4: A review on advancements and opportunities in natural language processing. arXiv preprint arXiv:2305.03195, 2023.
  119. Logical approach to livelock and deadlock of deterministic finite state machines: Modelling and finding. In 2021 40th Chinese Control Conference (CCC), pages 1–6. IEEE, 2021.
  120. Exploring large language model for graph data understanding in online job recommendations. arXiv preprint arXiv:2307.05722, 2023.
  121. Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023.
  122. Edward Y Chang. Examining gpt-4: Capabilities, implications, and future directions, 2023.
  123. Team cadence at mediqa-sum 2023: Using chatgpt as a data augmentation tool for classifying clinical dialogue, 2023. URL https://ceur-ws.org/Vol-3497/paper-139.pdf.
  124. Chatgpt and open-ai models: A preliminary review. Future Internet, 15(6):192, 2023.
  125. Artificial intelligence for drug discovery: Are we there yet? Annual Review of Pharmacology and Toxicology, 64, 2023.
  126. Ensuring academic integrity and trust in online learning environments: A longitudinal study of an ai-centered proctoring system in tertiary educational institutions. Education Sciences, 13(6):566, 2023.
  127. A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision Analytics Journal, page 100204, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chen Wang (599 papers)
  2. Jin Zhao (55 papers)
  3. Jiaqi Gong (4 papers)
Citations (2)
Youtube Logo Streamline Icon: https://streamlinehq.com