A Brief History of Prompt: Leveraging Language Models. (Through Advanced Prompting) (2310.04438v2)
Abstract: This paper presents a comprehensive exploration of the evolution of prompt engineering and generation in the field of NLP. Starting from the early LLMs and information retrieval systems, we trace the key developments that have shaped prompt engineering over the years. The introduction of attention mechanisms in 2015 revolutionized language understanding, leading to advancements in controllability and context-awareness. Subsequent breakthroughs in reinforcement learning techniques further enhanced prompt engineering, addressing issues like exposure bias and biases in generated text. We examine the significant contributions in 2018 and 2019, focusing on fine-tuning strategies, control codes, and template-based generation. The paper also discusses the growing importance of fairness, human-AI collaboration, and low-resource adaptation. In 2020 and 2021, contextual prompting and transfer learning gained prominence, while 2022 and 2023 witnessed the emergence of advanced techniques like unsupervised pre-training and novel reward shaping. Throughout the paper, we reference specific research studies that exemplify the impact of various developments on prompt engineering. The journey of prompt engineering continues, with ethical considerations being paramount for the responsible and inclusive future of AI systems.
- A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:13756489
- J. Luketina, N. Nardelli, G. Farquhar, J. N. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T. Rocktäschel, “A survey of reinforcement learning informed by natural language,” ArXiv, vol. abs/1906.03926, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:182952502
- R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” ArXiv, vol. abs/1705.04304, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:21850704
- B. Liu, G. Tür, D. Z. Hakkani-Tür, P. Shah, and L. Heck, “Dialogue learning with human teaching and feedback in end-to-end trainable task-oriented dialogue systems,” in North American Chapter of the Association for Computational Linguistics, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:4938015
- M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, “Sequence level training with recurrent neural networks,” CoRR, vol. abs/1511.06732, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:7147309
- S. Bengio, O. Vinyals, N. Jaitly, and N. M. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” ArXiv, vol. abs/1506.03099, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:1820089
- P. F. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” ArXiv, vol. abs/1706.03741, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:4787508
- T. Chakraborty, G. Badie, and B. Rudder, “Reducing gender bias in word embeddings,” 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:12616116
- B. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:9424845
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” ArXiv, vol. abs/1810.04805, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:52967399
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” ArXiv, vol. abs/1907.11692, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:198953378
- J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in Annual Meeting of the Association for Computational Linguistics, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:40100965
- C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to fine-tune bert for text classification?” in China National Conference on Chinese Computational Linguistics, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:153312532
- K. Labusch, C. Neudecker, and D. Zellhöfer, “Bert for named entity recognition in contemporary and historic german,” in Conference on Natural Language Processing, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:208192606
- A. Coenen, E. Reif, A. Yuan, B. Kim, A. Pearce, F. B. Viégas, and M. Wattenberg, “Visualizing and measuring the geometry of bert,” ArXiv, vol. abs/1906.02715, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:174802633
- M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in North American Chapter of the Association for Computational Linguistics, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:3626819
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. J. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” ArXiv, vol. abs/2005.14165, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:218971783
- J. Kaplan, S. McCandlish, T. J. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” ArXiv, vol. abs/2001.08361, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:210861095
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:160025533
- N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher, “Ctrl: A conditional transformer language model for controllable generation,” ArXiv, vol. abs/1909.05858, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:202573071
- D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, “Fine-tuning language models from human preferences,” ArXiv, vol. abs/1909.08593, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:202660943
- M. T. Ribeiro, T. S. Wu, C. Guestrin, and S. Singh, “Beyond accuracy: Behavioral testing of nlp models with checklist,” ArXiv, vol. abs/2005.04118, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:218551201
- E. Fleisig and C. D. Fellbaum, “Mitigating gender bias in machine translation through adversarial learning,” ArXiv, vol. abs/2203.10675, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247594904
- X. Han, T. Baldwin, and T. Cohn, “Towards equal opportunity fairness through adversarial learning,” ArXiv, vol. abs/2203.06317, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247447211
- S. Jain and B. C. Wallace, “Attention is not explanation,” in North American Chapter of the Association for Computational Linguistics, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:67855860
- A. Rahimi, Y. Li, and T. Cohn, “Massively multilingual transfer for ner,” in Annual Meeting of the Association for Computational Linguistics, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:153313061
- J. Hu, S. Ruder, A. Siddhant, G. Neubig, O. Firat, and M. Johnson, “Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalization,” ArXiv, vol. abs/2003.11080, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:214641214
- Golam Md Muktadir (2 papers)